Best 

Available 

Copy 


UNDERSTANDING  DATA  STRUCTURES 


AD-A008  937 

Rob  Gerritsen 

Carnegi e-Mel  1 on  University 


V ^ 


. i O 


r-  "S  4 5 


1 29090 


UNDERSTANDING  DATA  STRUCTURES 
by 

Rob  Gerritsen 
February  1975 


DEPARTMENT 

of 

COMPUTER  SCIENCE 


~o>* 


AIR  FORCE  OFFICE  CF  ^CTflTfF'C  Pr''EAR''H 
NOTICE  OF  TR'AMrin. ’ t.  (afsc; 

TNs  t«cF.v-i! 

3 pljf.'  j ,:•  ? ■ ;;  ; 

Ciir  '^j  lc'i  .n  i:,. .; ...  j. 

D.  iV.  Trt'iLOR 

Toc!;ricai  Imonnaiion  Officer 


'••“"i  - 3rd  is 
i^J-U  (7b), 


^ D D C 

p\pr>r77Tn 

r/  APS  to  B?S 

^ liliiiiCijlijljU 

Carnegie -Mel  Ion  University 


Repfoduced  by 

NATIONAL  TECHNICAL 
INFORMATION  SERVICE 

US  DepeftmenI  of  Conmorco 
Spnngf.eld,  VA  32ISI 


UNCIASSIFIEI) 


SeCURtTV  CL»SSIFICATlOH  OF  THIi  PAOF  tKhrn  /)«•  e.nttrtd) 


1 REPORT  DPCUMEHTATIDH  PAGE 

READ  INSTRUCTIONS  1 

BEFORE  COMiU.ETlMC.  FORM  i 

1.  9)FPOr«T  MUM9CR 

c - - 

2.  GOVT  ACCrSSION  TO. 

3 RECiPlENV'i  CaTAuOG  NUMDfR  . 

I^d  - A 937 

4.  title  icr>a 

UNDERSTANDING  DATA  STRUCTCRES 

S.  TVB>E  or  REPORT  4 PEMtOD  COVENEO 

Interim 

t PERFORMING  ORG  REPORT  NUMBER 

7.  AuTHOR<'«> 

Rob  Gerritsen 

4.  CONTRACT  OR  GRANT  NUMBER<'«> 

F44620-73-C-0074 

».  PESFOl'MmS  organization  NAME  AND  AODREtt 
. Carncgie-Mellon  University 
Department  of  Computer  Science 
Pittsburgh,  PA  15213 

10.  PROGRAM  ELEMENT.  PRCJ.>;CT,  TASK 
AREA  0 BORK  UNIT  NUMBERS 

61101D 

A02466 

>1  CONTROLLING  OFFICE  NAME  AND  AOOREtS 

Defense  Advanced  Research  Projects  Agency 
1I400  Wilson  Blvd 
Arlington.  VA  22209 

12.  report  DATE 

February  1975 

U.  NUMBER  or  PACES 

230 

U MONITOfriHC  AGFNCV  NAME  A ADORESSr/f  Ul</Fr«nl  from  ConUollInt  Olllcoi 

Air  Force  Office  of  Scientific  Researcl^v’// ! 
1400  Wilson  Blvd 
Arlington,  VA  22209 

IS.  SECURITY  CLASS.  (9$  thf  r^porO 

UNCIASSIFIED 

ts«.  DECLASSIFICATION  OOWNGMAOING 
SCHEDULE 

\ 

14.  OlSTRIOUTiON  STATEMENT  ft>f  i/iU  htport) 

Ar*nv*rv»»orl  f nv  r»iiKl-lr*  H-f  c t*r  4 b»i  f "f  nn  iinl  iTTi'?  fod 

IT.  distribution  ST  at  EMCNT  (e!  >h«  akalracr  tntmtrd  In  Block  »0,  II  dllloronl  from  Rtpocl) 


SWPPLCWeNTANV  notes 


ft«pro^«K*d  by 

NATIONAL  TECHNICAL 
INFORMATION  SERVICE 

US  of  CemMorc* 

StrinofMM.  VA.  22151 


mas  snuia  TO  dUNGE 


t».  Krv  «OnOS  mi 


•My  Mirf  ky  kfoclr 


to-  rrMifiiMF#  MO  •••••••  •!<#•## l»•e••••f|r  MRrf kr  ki*eir  miMk#/;  D3ta  management  progratimicri 

are  finding  their  jobs  arc  getting  tougher  because  of  the  gradual  replaccwcnt  oi 
sequential  data  bases  by  network  data  bases.  In  addition,  there  Is  a new  job 
called  "Data  Administrator"  for  handling  the  data  structure  problems  associated 
with  network  d.ata  bases.  The  goal  of  this  thesis  Is  reduction  of  these  data 

I VV'IIVV^OJ'*  ^ 5;  Tcl  V.’ ■?  .'t  pT-'^C  t PH  1 f o f t « ♦••  • »• 

‘ T*  insitrc  Ac  •raEli««l  M»v«r>  *f  ibit  r«tssp«li,  ik«.  R»1«  Mic  TmA  C^B^) 

j report  ha.,  been  sulecLt^g  .is  Uic  specification  of  the  cUiia  naaagemeuu  .i 


rr.n\'TT\-iirn> 


J 


DO  ,:2r“n  1473 


EDITION  OF  I NOV  6S  It  OBSOtETE  j 


UNCTASSTFIKD 


treUNITV  CLASSIFICATION  OF  THIS  PAGE  iuncn  H»l« 


r 


UNCTASSH’IED 


StCUWlTY  CLASSiriCATION  pr  THIS  P*eEfH7.r^  P.;. 


abstract  (CONTI iaUiD) 
v/hich  the  a^jplicati.  ons  function. 

d"  both  infonnation  rccrieva]  and  data  bace 

design  demonstrates  the  application  of  tlie  theory.  To  do  this,  the  theory  is 
captured  in  a Fra.e,  a set  of  fornal  rules  in  the  logic  of  program^  A^Au  L 
matic  Progranmlng  Generator(APG)  compiles  thi.s  Frame  to  an  operating  program  . 

VO  frames,  lesulting  in  two  programs,  are  discussed.  One  of  these"pro'-rnns 
brjnJ^ucJi're""’"'""  retrieval  procedure.  The  other  program  designs  a“d.ua 

relational  queries  (expressed  in  the  HI-IQ  langua 

tLn?to\  I programs  can  be  viewed  as  translators  from  relational  descrip- 
tions  to  nccess  putli  descriptions, 

Prograiraiing  and  data  structure  design  arc  cerebral  tasks.  Deriving  the  Fra-nes 

,““<''”“''0  structure  so  that  these  t.sk^c,:  b^?e! 

plicated  was  aifficult.  These  difficulties  are  discussed  for  the  benefit  of 
others  who  w.ant  to  apply  Artificial  Intelligence. 

ma^L'^nroprf  J“®tl£ied  in  practical  terms:  A comparison  of  manual  and  autoJ 

matic  programming  costs  shows  a potential  cost  reduction  of  up  to  98r  for  auto  ^ 
matic  programming.  p y-  j.ui  uu co- 


ll 


P'hGlAS.'^llTFD 


leruntTV  CL4kSSIFIC«TlON  OF  This  PFOCcmitn  fJ«>.  l n(,.,  o 


UNDERSTANDING  DATA  STRUCTURES 


5 


by 

Rob  Gerrilsen 
February  1975 


DISSERTATION 

SubmiMocI  in  partial  fulfillment  of  the  requirements  for  the  degree  of  Doctor  of 
Philosophy  in  Industrial  Aciministralion  (Systems). 


Graduate  School  of  Industrial  Administration 
Carnegie-Mellon  University 
Pittsburgh,  Pennsylvania 


/ 


i 

This  research  was  supported  in  part  by  the  Advanced  Research  Projects  Agency 

of  the  Office  of  the  Secretary  of  Defense  under  contract  F44620-/3-C-007A.  A / 

William  Larimer  Mellon  Fellowship  provided  personal  funding. 

I.V 

{ 


TO  MY  PARENTS 

Alexander  N.  Gerritsen 
Jacqueline  K.  Gerritsen 


UNDERSTANDING  DATA  STRUCTURES 


table  of  CONTENTS 


acknowlfd(u;ment 

A 

ABSTRACT 

LIST  OF  ILLUSTRATIONS  ^ 

1 THEORY  AND  SURF'ORT  ^ 

1.1  Introdiiffion 

1.2  Relevant  Literature 

1.3  nnTG  Data  Baser. 

l.A  Matr.ces,  Mterarchies,  Relationships  and  Networks 


2 DATA  MANIPULATION  LANGUAGE  PROGRAMMER 

2.1  Introduction 

2.2  The  Af’G  System 

2.3  Pror/am  Germration  vs  a Generalized  Interpreter 
2. A HI  IQ,  the  Query  Languase 

2T5  Request  Handler  Assertions 

2.6  A BNF  Description  of  the  Generated  Procedure 

2.7  Assertions 

2.8  A Frame  for  the  Semantics  of  Data  Structures 

2.9  Efficiency  Considerations 

2.10  Algol  to  COBOL  Con-ersion 

2.11  Examples  of  Procedure  Generation 


27 
27 
35 
A2 
A5 
57 
63 
77 
89 
lUA 
109 
1 15 


DESIGN 

OF*  DATA  STRUCTURES 

3.1 

Introduction 

3.2 

Use  of  the  Programmer 

3.3 

Aiitnmalic  Data  Structure  Dcsii 

3.A 

Defining  Item  Names 

3.5 

Generating  Assertions 

3.6 

Designing  Record  Relationships 

3.7 

A Frame  f ir  Record  Relationship  Desig-' 

3.8 

Designing  Record  Contents 

3.9 

Example 

3.10 

Alternative  Implementation 

13A 

I3A 

137 

I AO 

1A3 

lAA 

150 

155 

16A 

169 

ISO 


A EXTENSIONS 

A,1  Extenstions  to  the  DMLP 
A.2  Extensions  to  the  Data  Structure  Designer 
A.3  Automation  of  Data  Base  Update 
A.A  Data  Base  Restructuring 


182 

182 

190 

192 

19A 


UNDERSTANDING  DATA  STRUCTURES 


3 


ACKNOWLEDGEMENT 

1 thank  Profor.'.or  Jack  R.  Buchanan,  my  thesis  supervisor  and  chairman  ot  the 
committee;  his  suBpestions  and  direction  were  esse  dial  Ihrouchoul.  I also  thank 
Professor  Charles  H.  Kriebel  and  Professor  Herbert  A.  Simon,  members  of  my 
thesis  committee,  for  their  excellent  teaching  and  advice.  Finally,  I am  deeply 
indebted  to  my  wife,  Joyce,  who  not  only  provided  encouragement  but  also  did 
most  of  the  typing  and  proofreading. 


UNDERSTANDING  DATA  STRUCTURES 
Table  of  Contents 


2 


5 COST  EnTCTIVrMrSS  OF  AUTOMAftC  PROGRAMMING  197 

5.1  MfMsuros  of  Automatic  Program  Generation  197 

5.2  Cost  Factors  in  Programming  ?00 

5.3  Comparison  of  Dollar  Costs  202 

5. A Exofution  Costs  20A 

5.5  Cost  of  Data  Structure  Design  205 

6 ON  THE  APPLICATION  OF  ARTIFICIAL  INTELLIGENCE  207 

6.1  A Data  Management  Application  208 

6.2  Afc)uisition  and  Representation  of  Knowledge  . 213 

7 THE  RFIATIONAL  AND  NETWORK  MODELS  OF  DATA  BASES  217 

7.1  Introduction  217 

7.2  Levels  of  Data  Structure  Description  217 

• 7.3  Translation  Approach  219 

) 7.4  Advantages  of  Translation  220 

7.5  Two  Implemented  Translators  221 

?2? 


HIRLIOGRAPHY 


UNDERSTANDING  DATA  STRUCTURES 


3 


ACKNOWLEDGEMENT 

1 Ihank  Profer.t.or  Jack  R.  nuchanan,  my  thesis  supervisor  and  chairman  of  the 
committee;  fiis  Migpcstions  and  direction  were  essential  throughout.  I also  thank 
Professor  Charles  H.  Kriehel  and  Professor  Herbert  A.  Simon,  members  of  my 
thesis  committee,  for  their  excellent  teaching  and  advice.  Finally,  I am  deeply 
indebted  to  my  wife,  Joyce,  who  not  only  provided  encouragement  but  also  did 
most  of  the  typing  and  proofreading. 


UNOERSTANDirs  DATA  STRJCTUF^rS 


AHSTRACT 


Datn  mannty  ,nrnl  r-<  Ori  amme,  arc  fmd'nR  the.r  ,olv.  aro  RnllmR  to,..;hor 
befair.p  of  Ihr  gradual  roplarcmont  of  ^oq,.onl.al  data  ba^oo  by  network  data 
ba;.cc  In  addd.on,  there  is  a new  ,ob  calk  t "nala  Administrator"  for  tiandimp 
ie  cata  structure  problems  associated  v th  network  data  bases.  The  poal  of 
this  thesis  ,s  rectuction  of  there  data  raanaRement  tasks  by  developme  and 
applymp,  a practical  theory  of  data  structure.  To  insure  the  practical  flavor  of 
this  research,  the  Data  Hare  Task  Group  (DHTG)  report  has  been  setcctcrf  as  the 
specification  of  the  data  mananoment  system  in  which  the  applications  function. 

An  implemented  system  that  automates  both  information  retrieval  and  data  base 
design  ctemonstrates  the  application  of  the  theory.  To  do  this,  the  ttieory  is 
captured  in  a frame,  a <et  of  formal  rules  in  the  logic  of  programs.  An 
Automatic  Propramminr,  Generator  (APG)  compiles  this  Frame  to  an  operaling 
program.  Two  frames,  rosullmg  m two  programs,  are  discussed.  One  of  Ilirse 
programs  generates  information  retrieval  procedure.  The  other  progiam 
designs  a data  base  struc  tore. 

Both  of  these  pingrams  accept  relational  queries  (pypressed  in  the  Ml  lO 
laneuage  as  input.  These  programs  can  be  viewed  as  transtabrs  from 
relational  descriptioir  to  access  path  descriptions. 

Programming  and  data  slruc'ure  design  are  cerebral  fa-ks.  Deriving  the  F rames 
whc?roby  the  programs  understand  data  structure  so  that  these  tasks  can  bo 
replicated  was  difficult.  These  difficulties  are  oiscussed  for  the  benefit  of 
others  wbo  want  to  apply  Artificial  Inlrlhgcnc c. 

Thij  re.eaicb  is  luslifiecl  m practical  terms;  A compai  ison  of  manual  and 

au  oma  ic  programming  costs  shows  a potential  cost  reduction  of  up  to  98/  for 
automatic  pi  ogi  ainmnig. 


fo 


UNDERSTANDING  DATA  STRUCTURES 


5 


USI  OF  ILLUSTRATIONS 

1-1  Apphcalionr,  of  data  structure  theory. 

1-2  Simple  doctor-patient  data  base. 

1-3  Uotler  DOCTOR-PATIENT  data  base. 

1-4  Simple  tabular  reports. 

1-5  Hierarchical  report. 

1- 6  Hierarchical  report  (inverted  from  Figure  1-5). 

2- 1  Simple  query  with  generated  procedure. 

2-2  System  flows. 

2-3  Average  cool  per  usage  for  generated  and  interpreted 

programs  in  a procluclicn  environment. 

2-4  L3NF  for  the  HI-IQ  language. 

2-5  Two  eyampleo  of  retrieval  conditions. 

2-6  Query  (PI)  for  the  report  of  Figure  2-7. 

2-7  A hierarchical-matrix  report. 

2-8  Templates  for  Request  Handler  assertions. 

2-9  Possible  values  of  Ihe  TYPE  parameter  in  LINKS. 

2-10  Asse  rlions  describing  Ihe  query  of  Figure  2-6. 

2-11  A F)NF  descriplion  of  the  generated  procedure. 

2-12  Code  generated  for  tlie  condition  of  Figure  2-5. 

2-13  Asser lions  used  to  describe  Ihe  data  base. 

2-14  Assertions  which  indicate  the  results  of  single 

program  statements. 

2-15  Ollier  assertions. 

2-16  Assertions  used  to  describe  program  blocks. 

2-17  Assertions  evaluated  by  LISP. 

2-18  Standard  LISP  and  Micro-Planner  predicates 

used  in  Ihe  rules. 

2-19  Index  to  Figures  2-8  and  2-13  through  2-18. 

2-20  Operator  (type  SI)  rules. 

2-21  The  iteration  rule. 

2-22  Type  S3  and  S4  rules. 

2-23  Rule  and  production  correspondences. 

2-24  Area  search  vs  a SYSTEM  owned  set  search. 

2-25  Program  PI  (Figure  2-6)  in  Algol. 

2-26  Program  PI  in  COBOL. 

2-27  Index  to  the  examples. 

2-28  A community  medical  data  base  structure. 

2-29  A sales  data  base  structure. 

2-30  Query  P2. 

2-31  Program  P2. 

2-32  Query  P3. 

-33  Program  P3. 

-34  Query  P4. 


8 

19 

20 

23 

24 
26 

29 

32 

42 

43 
5 1 

53 

54 

58 

59 
62 
64 
72 

78 

79 
81 
83 
85 

87 

83 

91 

94 

95 
99 

105 

1 13 

114 
I 16 
117 
I 18 
121 
122 

124 

125 
127 


UNDERSIANUIMCi  DATA  STRUCTURES 
List  of  lllu-r.lratinno 


6 


2-35  PrOi;ram  P4, 

2-36  Query  P5. 

2-37  Program  P5.  >31 

2-38  Query  P6.  *^2 

2- 39  f’ron,ram  P6. 

3- 1  A Mii'plc  u'icr-programmer-DBA  G/stcm.  138 

3-2  Data  ha'.r  ctruciurr  dcGigncr;  general  program  flow.  M2 

3-3  AGGcrtions  and  their  meanings.  M5 

3-4  Transformation  of  a confluent  hierarchy  within 

the  request  context. 

3-5  Construction  of  record  relationships.  151 

3-6  Rules  for  generating  record  relationships.  156 

3-7  Assertions  and  ttieir  interpretations.  159 

3-8  Assertions  before  and  after  application  of 

the  first  three  rules.  160 

3-9  Some  interesting  structure  transformations.  162 

3-10  Determination  of  record  content  - general  flow.  165 

3-11  Rules  for  establishing  record  conicnts.  166 

3-12  Query  numtier  one.  1^1 

3-13  Query  number  two.  1^2 

3-14  Query  numl)er  three.  1^3 

3-15  Query  nurribor  fou. . ^ 

3-16  Query  number  five.  1 

3-17  Query  number  six.  1^6 

3-18  Assertions  generated  from  queries  1-6  177 

3-19  Generated  data  base  structure.  178 

3- 20  Data  structure  diagram  for  Figure  3-19.  179 

4- 1  Recursive  network  structure.  IhA 

5- 1  CoU  of  automatic  program  generation.  133 

5-2  COtlOl  cost  factors  for  1000  source  lines.  200 

5-3  % Cost  for  1000  lines  of  COBOL  in  1974.  203 

5-4  Cost  of  generating  a design.  206 


UNDERSTANDING  DATA  STRUCTURES 

1 THEORY  Af^  SUPPORT. 

1 , 1 Introduction. 

Data  management  programmers  are  finding  their  jobs  are  getting  tougher 
because  of  the  gradual  replacement  of  sequential  data  bases  by  netv/ork  data 
bases.  In  addition,  there  is  a new  job  called  "Data  Admmistrato'"  [CODASYL 
1971a]  for  handling  Ihe  data  structure  problems  associated  with  network  data 
bases.  The  goal  of  this  thesis  is  reduction  of  these  data  management  tasks  by 
developing  and  applying  a practical  theory  of  data  structure.  To  insure  the 
practical  flavor  of  this  research,  1 have  selected  the  Data  Base  Task  Group 
(DBTG)  [CODASYL  1 97 1 a]  report  as  the  specification  of  the  data  management 
system  in  which  the  applications  function. 

An  implemented  system  that  automates  both  information  retrieval  and  data  base 
design  demonstrates  the  application  of  the  theory.  To  do  this,  the  theory  is 
captured  in  a Frame,  a set  of  formal  rules  in  the  logic  of  programs.  An 
Automatic  Programming  Generator  (APG)  compiles  this  Frame  to  an  operating 
program.  1 will  discuss  two  Frames  that  resulted  in  two  programs.  One  of 
these  programs  generates  information  retrieval  programs.  The  other  program 


designs  a data  base  structure. 


UNDERST/'rjOlfJG  [)AIA  SfRUCTUf^tS 
1.1  Inti  odiir  lion. 


8 


DDL 

DESCF^IF’TION 


N 


V 

\ 


QUERY 


DMI  P 1 

COBOL /DML  j 

1 " ' ^ 

PROCEDURE  1 

QUERY 


\ ^ \ 


QUERY 


STf^lCTURE 

DESIGNER 

Eicurc'  1-1.  Applicfliiorn.  of  ctal^  otriKlure  llieory. 


Figure  I -I  illtr.lr.ilco  Ihcie  two  programr.,  a Data  Manipulation  Language 
Programmer  (DMI  P)  and  a oirucfure  designer.  Tlie  DMLP  translates  a query  lo 
COBOL  prorednre  augmented  by  Data  Manipulation  Language  (DML).  lo  do  this 
it  uses  a Oal.i  Definition  Language  (DDL)  [CODASYL  1971a]  descriptior  of  the 
data  base  structure. 

The  DMLP  is  data  base  independent.  The  Frame  that  defines  the  DMLP  contains 
only  general  Knowicdi’e  of  data  structures  and  programming.  Specific 
Icnowledge  of  a particular  data  base  is  contained  in  the  DDL  description. 


UNDERSTANDING  DATA  STRUCTURES 
1.1  Infroclucfion. 


9 


Ncle  that  the  DMLP  docs  not  perform  the  information  retrieval  described  in  the 
query.  If  only  generates  a procedure  that  will  retrieve  the  desired  information 
when  it  is  executed. 

The  other  program  in  Figure  1-1  generates  a DDL  description  from  a set  of 
queries;  if  tries  to  find  a design  that  captures  all  data  relationships  implied  by 
the  queries. 


The  logical  structure  of  the  data  base  is  of  primary  concern  in  this  thesis. 


Actual  physical  mapping  of  the  data  to  that  structure  will  not  bo  considerect. 


Nor  does  the  system  consider  certain  data  characteristics  including  dala  type, 
retrieval  frequency,  update  volatility  and  dala  volume. 


The  accomplishments  of  this  thesis  are  as  follows: 

Chapter  1;  presents  general  concepts  (theory)  of  network 
data  structures. 

Chapter  2: 

(a)  presents  a Frame,  a formal  (axiomatic) 

reprcscniation  of  programming  for  DBTG  data  bases, 
and  Hl-IQ,  an  interactive  query  language. 

(b)  discusses  compilation  of  (a)  by  the  APG 
enabling  it  to  write  information  retrieval  procedures. 


Chapter  3; 

(a)  presents  a new  algorithm  for  DBTG  data  base 
design  including  a Frame,  a formal  (axiomatic) 
representation  ot  the  design  process. 

(b)  discusses  compilation  of  (a)  by  the  APG 
enabling  if  to  design  dala  structures. 


UND^f^ 5 TAMPING  DATA  STRUCTURES 
1.1  Introdiictirm. 


10 


Chapter  A;  r.unt'.erl',  further  rcf.carch  mrluciinp,  exienr.ion  of 
the  trclmiqu'''.  u'.rd  in  Chap'r'r>,  7.  and  3 to  other 
ctata  man.ipement  task*.. 

Chapter  S:  (ternonr.trato^  tl)c  eronornir  validity  of  the 

afiproach  - a potential  prograniming  cost  reduction 
of  up  to  98/. 

Chapter  h;  prc'.riitr,  some  remarks  on  the  difficulties 
encounterret  in  this  research  and  the  expected 
impact  on  i etatecl  research. 

Chapter  7;  claims  tliat  this  research  bridges  the  gap  between 
the  relationat  and  network  models  of  data  bases. 


UNDERSTA\NniNG  DATA  STRUCTURES 


11 


1.2  Relevant  Lil^raturc. 

This  work  is  related  to: 

a)  Aiilornatic  and/or  otroctured  programming, 

b)  Data  bases  and  their  implementation, 

c)  Non-proceclural  information  retrieval. 

The  following  discussion  will  proceed  m the  above  sequence.  Obviously  there  is 
no  clean  boundary  between  some  of  these  fields:  Category  c bridges 

categories  "a"  and  "b". 

1.2.1  Automatic  and  structured  programmjnf,. 

Automatic  programming  systems  have  always  used  theorem  provers  of  some 
sort.  PROW  [Waldinger  and  Lee  19691  is  a clear  example,  being  based  on 
predicate  calculus  theorem  prover.  Preceding  tnis  effort  by  several  year^  w 
Simon's  Heuristic  Compiler  [Simon  1961;  also  in  Simon  and  Siklossy  1972]  which 
was  based  on  the  General  Problem  Solver  [Newell,  Shaw  and  Simon  I960]. 
Although  the  terminology  used  by  Simon  does  not  imply  the  use  of  a theorem 
prover,  it  is  not  very  difficult  to  view  GPS  as  a theorem  prover:  The  "Means-End 
Analysis"  done  by  GPS  is  goal  driven  as  are  some  theorem  provers;  GPS 
operators,  which  may  be  viewed  as  rules  of  inference,  achieve  the  goal  by 

removing  differences. 

Hewitt  [ttewitt  1971]  proposed  a programming  language  called  PLANNER  m which 


UNIXRSTAMOIMG  DATA  STRUCTURES 
1.2  Relpvant  I ilf'ralure. 


12 


the  rulco  lot  pinvmp,  throrrmc  (or  achicv^mp.  poals)  may  be  expressed  as  part  of 
the  propraiti.  This  facilitates  the  implcmcnlation  of  theorem  provers  in  other 
prohfem  clcjmains  (micH  as  automatic  programming  or  natural  language 
undersfanc'ing)  because  if  lomovrs  a kvcl  of  interpretation.  Hewitt's  iefeas 
were  implemented  in  a programming  language  called  Micro-Planner  [Sussman  and 
Winogracf  1972], 

Hoare’s  development  of  a system  of  logic  lor  the  definition  of  the  semantics  of 
programming  languages  [Hoarc  1969]  has  proven  useful  for  automatic 
programming.  So  has  the  concept  of  structured  programming  [Dahl,  Dijkstra  and 
Hoare  197?],  The  logical  basis  of  the  automatic  programming  system  reported 
in  [Buchanan  and  lurkliam  t97A;  Buchanan  197A]  was  described  using  ttie  Hoare 
logic  This  system  contains  a compiler  that  translates  a non-proceclural 
definition  of  a programming  environment  (whose  elements  correspond  in  form  io 
statements  in  Hoare’s  logic  of  programs)  into  Micro-Planner  theorems  and  LISP 
functions. 

A program  resulting  from  sucti  a compilation  is  a program  generator  that  is 
capable  of  generating  specific  programs  satisfying  given  sets  of  input -output 
assertions.  The  program  generator  of  Chapler  Two  and  the  data  base  designer 
of  Chapter  Throe  were  formed  by  an  extended  version  of  the  aforementioned 
system. 


UNDERSTANDING  DATA  STRUCTURES 
1,2  Relcvart  Litcrnture. 


13 


1.2.2  P.«t^  1)-^'  Oi  aM  thf'jl  Lmplcrp^ntalion. 

The  rt?tiif’val  propranv  that  can  be  generated  are  capable  of  rctucvmg 
information  fiom  data  bare-,  wl.ich  allow  the  rto-.'ge  of  highly  inter-related  data. 
Such  data  liareo  are  typically  called  network,  hierarchical,  or  relational  data 
bar.es,  anci  they  are  quite  different  from  the  set  of  sequential  files  that 
traditionally  formed  a data  base. 

There  exists  some  disagreement  (Codd  and  Date  197d;  Date  and  Code!  197d; 
Jarcline  llTd]  regarding  differences/similarities  of  the  network  and  relational 
models  of  data  bases.  Some  comments  relevant  to  this  discussion  and  the 
impact  of  this  thesis  on  the  disagreement  are  presented  in  Chapter  Seven.  It  is 
goneraliy  agreed,  however,  that  hierarchical  data  bases  aie  a subetass  of  the 
network  type. 

The  network  data  base  was  commercially  developed  at  General  Electric 
[Bachman  and  Williams  19GT;  Bachman  1965;  General  Electric  1965].  A specific 
notation  for  describing  the  structure  of  a network  data  base  has  been 
developed  [Bachman  1969].  An  attempt  at  standardization  of  this  type  of  data 
base  was  made  by  fire  Data  Base  Task  Group  [CODASYL  1971a].  Several 
commercially  available  implementations  of  this  standardized  design  are  available, 
for  example  [Xerox  19701  A generalized  hierarchical  data  base  system  has 
been  implemented  at  Bell  Laboratories  [Gibson  and  Stockhausen  1973]. 


UNDLRSTAriniNG  DATA  STRUCTURES 
1.2  F^clov.inl  Lilcrnkirc 


1 A 

In  !\  nrtwoik  clal;i  b?'‘>c  such  as  Ihc  DBTG  specific alion,  the  rclaltonf  lielwrcn 
dal  a miisl  h'’  rvpltcilly  named  atid  defined.  llic  DUTG  terminology  for  a rclalion 
is  a SET.  Tim  rlemcnh  of  a SET  arc  linKrcl,  hence  the  "iicIworK"  char ac Innslic 
of  such  data  bases.  Codd  [Code!  lA/O,  I97?a,  I972h)  and  Childs  [Childs  I9GS] 
have  developed  whal  is  lermeci  Ihe  relational  model  of  data  bases  In  this 
model  a rel.'tion  is  a defined  template  of  related  data  elements.  The  user  of  a 
relational  data  base  defines  elemcnis  and  lemplales  ralher  than  element'’  and 
the  *inks  l)elween  Ihom  as  mini  be  defined  for  a DBTG  type  data  base. 

A discussion  of  olhei  dala  base  implemei  tations  can  he  found  in  (CODASYL 
I97lbl.  Some  of  Ihe  systems  discussed  in  the  CODASYL  rcporl  utilise 
sequc^ntiiil  or  sfrnple  hierarchical  data  bases  that  do  not  present  many  of  tlie 
complovrties  encountered  with  moic  structured  data  bases.  Infoimation 
retrieval  ' ystorns  based  on  the  relational  model  have  been  implemented  al  I13M 
[Boyce,  Chamberlin,  King  and  Hammer  1973]. 

1.2.3  Non -piocediiral  information  retrieval. 

"Non-procedural",  "problem  slatemcnt",  and  "goal  oriented"  languages  are  often 
discussed  for  information  retrieval  because,  it  is  hoped,  such  a language  will 
allow  llic  information  user  direct  access  to  Ihe  dal''  without  the  intervention  of 
a programmer  to  translate  his  needs  to  a computer  procedure  (e.g.  Ihe 
retrieval  program).  Many  such  non-procedural  languages  have  been  developed 
and  some  that  arc  more  well  known  are  discusseJ  in  [CODASYL  1971b]  and  in 


UNDERSTANniNG  (,'AIA  STRUCTURES 
1.2  Relpv.nnl  l.itn  pturo. 


lb 


[Teichrorw  1970],  Of  the  bngu^ger.  dircuo^ecl  in  the'-e  repo,  to  the  once  Uial 
arc  non  pror  crkiral  arc  aecocialed  with  the  simpler,  sequential  dal  a hasp 
structures. 

For  cyamplc,  M.Af^K  IV  [Informatics  1969;  Posiley  1969],  probably  the  mO' t 
commercially  successful  of  the  systems  that  speedy  a non-procedural  language, 
is  a forms  oriented  system  that  retrieves  irdormalion  based  on  a specification  of 
the  desired  report.  This  system  is  limited  to  the  sequential  processing  of  files 
and  does  not  allow  inter-record  structures  although  intra-record  hierarchies  arc 
allowed. 

•As  for  languages  for  more  complex  data  bases,  Early  [Early  1972]  discusses  the 
idea  of  language  siralificalion.  That  is,  a retrieval  can  be  specified  in  machine 
language,  access  path  language,  or  a relational  language.  Using  Early’s 
terminology.  111  IQ  is  a relational  language,  and  the  DMLP  is  a translator  from  a 
relational  language  tc  an  access  path  language  (COBOL/DMI  ).  Further 
translation  from  the  access  path  language  to  a machine  language  is  accomplished 
by  ti  o D131G  implementation,  usually  via  subroutine  calls  to  a runtime  library  or 
with  macro  expansion  clurtng  compilation. 

The  problem  of  translating  from  a non-procedural  to  a procedural  languag.e  has 
also  been  discussed  by  Teichrocw  [Teichroew  and  Sayani  1971].  An  example  of 
such  a translator  for  a tiierarchical  system  was  implemented  at  Bell  l aboratories 


UNDFPSIAriOIMC'.  DATA  STfA,oTU«ES 
1.2  FTclrv.ini  Litcrfllurc, 


16 


[Put''linp,  rnnci  Robrrtr.  1^73], 

The  idea  o(  p'-0(0''.inp  .i  tvlwork  b/  all/^ckitig  a single  hie-'Jfchy  at  a tmte 
corie'i  from  the  woi  k of  Lavallce  and  olheis  [Lavallee,  lhayoii  a'ul  r-aiivain 
undated].  Their  work  also  introduces  the  important  concept  of  confluent 
hicrarc hies,  ttio  buiidmg  Ijlocks  of  network  data  bases. 

1.2.d  do'iiiVl 

[Teicliror-w  and  fiayani  1971]  briefly  discuss  the  goal  of  automating  the  data 
base  design  ta-  k.  With  the  evcr  plion  of  the  program  discussed  in  Chapter  3, 
little  progress  to  dale  has  been  reported. 


UNDERSTANDING  DATA  STRUCTURES 


17 


1.3  DBTG  Data  Bases. 

The  CODASYL  Data  Base  Task  Group  [COOASYL  1971a]  specified  a direct  access 
data  base  management  s /stem  of  the  network  type.  This  is  a so  called 
"general"  system,  the  idea  being  that  many  different  specific  data  management 
applications  will  be  implemented  that  make  use  of  the  general  system.  General 
systems  very  similar  to  the  DBTG  specification  have  been  implemented  by 
Honeywell  Information  Systems,  called  Integrated  Data  Store  (IDS)  [General 
Electric  1965],  and  by  Xerox,  called  Data  Management  System  (DMS)  [Xerox 

1970]. 

The  DBTG  report  specifies  two  user  interfaces.  One  is  the  Data  Definition 
Language  (DDL)  whereby  the  data  base  structure  is  described  in  a Schema;  the 
other  is  the  Data  Manipulation  Language  (DML)  whereby  actual  data  access  and 
storage  is  achieved.  The  schema  functions  as  a map  for  the  data  management 
library  routines.  These  routines  are  called  by  the  DML  statements  in  the 
application  program. 

DBTG  data  base  structure  is  defined  in  terms  of  "records",  "items",  sets  , 
"groups",  and  "areas".  Records  consist  of  a concatenation  of  items.  The  set  is 
a relation  between  records.  A set  always  has  a unique  owner  record  and  one 
or  more  member  records,  A set  defines  a one-to-many  relationship  in  the  data 
base  between  the  owner  record  and  member  records.  A particular  set 


UNDERSTANDING  DAT/.  STRUCTURES 
1.3  DBTG  Data  Bases. 


18 


occurrence  in  the  data  base  includes  at  most  one  occurrence  of  the  owner 
rucord.  A group  is  a subdivision  of  a record,  i.e.  a smaller  concatenation  of 
items.  An  area  is  a subdivision  of  the  data  base. 

.ucord  access  in  a DBTG  system  occurs  in  one  of  three  ways;  direct,  calculated. 
Or  via  a set  relationship.  A record  can  be  directly  accessed  if  its  storage 
address  is  known.  Calculated  access  is  possible  if  the  values  of  those  items 
from  which  the  sto.'age  address  can  be  calculated  are  known.  Records  for 
which  calculated  access  is  desired  must  be  so  specified  when  the  records  are 
defined  in  the  Schema.  The  items  that  are  to  be  u'^ed  in  the  calculation  must 
also  be  defined  in  the  Schema.  Set  relationship  accesses  allow  the  finding  of  an 
owner  of  a set  previously  accessed  via  one  of  its  members  or  the  finding  of  the 
first,  last,  next,  or  prior  member  of  a sot  previously  accessed. 

Data  Structure  (DS)  diagrams  illustrate  the  structure  of  DBTG  data  bases 
[Bachman  1969].  In  these  diagrams,  boxes  are  used  to  represent  records  and 
arrows  represent  sets.  The  arrow  always  emanates  from  the  owner  of  the  set 
and  points  to  the  set  member.  For  example,  Figure  1-2  illustrates  a sim.ple  data 
base  structure  defined  with  two  record  types  and  one  set.  Although  this  data 
base  can  contain  many  DOCTOR  records,  each  TREATING  set  occurrence  will 
contain  exactly  one  DOCTOR  record.  On  the  other  hand,  the  set  occurrence  mav 
contain  many  PATIENT  records,  one  record  for  each  patient  being  treated  by  the 


doctor  specified  in  the  DOCTOR  record. 


UNDERSTANOING  DATA  STRUCTURES 
1.3  DBTG  D.iln  naocr.. 


19 


Figure  1-?.  Simple  DOCTOR-PATIENT  data  bafc. 


In  those  diagrams,  underlined  items  represent  calculated  keys.  Therefore  a 
DOCTOR  record  can  bo  retrieved  using  a DOCNO  value. 

The  data  base  structure  in  Figure  1-2  does  not  allow  more  than  one  doctor  per 
palient  unless  patient  records  are  duplicated.  This  problem  does  not  occur  with 
the  data  base  of  Figure  1-3. 


UNDERSTANDING  DATA  STRUCTURES 
1.3  OBTG  Data  Bases. 


20 




PATIENT 

PAINQ 

DOCTOR 

DQCNQ 

. 



/ 

V 

TREATMENTS  v 

TREATING 

\ 

\ 

si  y 

TREATMENT  | 


Figure  1-3.  Better  DOCTOR-PATIENT  data  base. 


This  structure,  called  a confluent  hierarchy  [Lavalle  et  al  undated],  captures  a 
two-way  hierarchy.  In  Figure  1-3  a patient  may  have  many  associated 
treatments  and  therefore  many  associated  doctors,  and  a doctor  may  be 
performing  many  treatmjnts,  each  with  an  associated  patient. 

The  DDL  description  of  the  structure  of  a data  base  contains  the  same 
information  as  a data  structure  diagram. 

A programmer  uses  DML  commands  to  move  through  the  data  base.  For 
example,  to  find  the  doctor  with  identifying  number  100,  the  program  would 
contain  the  following  COBOL  and  DML  statements  (for  the  data  base  of  Figure 
1-2): 

MOVE  100  TO  DOCNO. 

FIND  DOCTOR  RECORD, 


UNDERSTANntMG  DATA  STRUCTURES 
1.3  DRTG  Riiscs. 


21 


Excculinp  tlip  f.lfltc'n  cnt 

FIND  FIRST  PAIIENT  RECORD  OF  TREATING  SET 
will  locatr  a palirnt  boiop.  treated  by  doctor  100,  if  d is  executed  followinp  the 
statement  that  found  tbo  cioctor.  The  statement 

FIND  NEXT  PATIENT  RECORD  OF  TREATING  SET 
can  tlien  lie  repeatedly  executed  to  find  all  of  the  patients  of  doctor  100. 

The  user  propram  must  check  an  error  code  to  determine  when  all  patient 
records  in  the  set  have  been  found. 

Notice  that  ttieie  is  a conk*xl  which  effects  communication  between  the  propram 
and  the  DI.UG  system.  The  DBTG  report  calls  the  context  "current".  When  Itie 
current  owner  of  the  TREATING  set  is  doctor  100,  then  all  commands  to  find 
patients  in  the  TREATING  set  wilt  find  only  patients  of  doctor  100. 


UNDERSlArJDir.’G  DATA  STRUCTURES 


22 


1.0  hif-.,  anH  Net  works. 

Thi'i  section  prr',(,>nls  the  theory  on  whitlt  vaIkI  data  base  procrammin^’,  must  be 

0 

based.  Tlie  knowl'’<t!’e  needed  try  a programmer  to  understand  data  structures 
and  their  ti  anstnr mations  is  set  forth.  These  data  structure  transformations  can 
occur  whenever  data  moves  into  and  out  of  the  data  base. 

The  assumption  is  made  that  people  manipulate  information  in  either  matrix  or 
hierarchical  form  (or  a combination  of  the  two).  This  assumption  may  not  be 
correct  in  all  cases  but  within  the  realm  of  tabular  reports,  forms,  etc  (eg.  a 
business  en'dronmcnt),  this  assumption  holds  for  a large  portion  of  lire 
information  ttansfers  thal  occur.  Prose  reports  also  follow  a hic'rarchical 
structure  (Ghefler  1958].  Further  evidence  of  this  hierarchical  strucluririg  is 
the  decimal  f)umhcring  of  paragraphs  in  technical  documents  (such  as  this  one). 

A simple  tahutar  report  (Figure  1-4)  can  be  viewed  as  a matrix  (e.g.  each  line 
is  a row  vcclor  and  each  column  is  a column  vector).  Typically,  eacti  column 
vector  can  be  associated  with  an  attribute  whereas  each  row  vcclor  is 
associated  with  some  type  of  simultaneous  occurence  of  values  for  the  several  ^ 
attributes.  Frequen'Iy  such  simultaneous  value  occurences  are  called  records. 

In  both  reports  the  attributes  (columns)  have  been  labeled.  The  simultaneous 
value  occurences  in  the  first  report  might  be  "patients",  those  in  the  second 


report  "doctors". 


UNDERSIANHING  DATA  STRUCTURES  23 

1.'4  Matricc''^,  Hicrarchio?.,  Rplalionships  and  Nnlworks. 


j 


NAME 

QlLCiNOSIS 

bMIlH 

48 

OOTUtISM 

JONES 

22 

APPENDICITIS 

WIlUAMSON 

31 

MISCARRIAGE 

NAME 

AGE 

SPf  CIAITY 

FREDERICKS 

41 

G.P, 

Flf^OWN 

36 

HEART 

SLENDER 

52 

GEN  SURGERY 

BLUE 

49 

GYNECOtOGY 

Figure  1-A.  Simple  tabular  reports. 


Note  thdl  these  simple  tabular  (matrix)  reports  can  be  (and  frequcnily  are) 
combined  into  hierarchically  organized  reports  as  in  Figure  I -6  The 
hierarchical  report  contains  additional  information  about  relationships  between 
records,  informalion  not  available  from  either  report  in  Figure  1-A.  General 
purpose  systems  (such  as  MARK  IV  [Informatics  1969],  RPG  or  GOGOL  Rr  pnrl 
Writer)  usually  allow  several  hierarchical  levels.  MARK  IV,  for  example,  allows 
up  to  nine  levels. 

If  one  thinks  of  a hierarchy  as  an  upside-down  tree  (wilh  the  repealing  records 
forming  the  branches),  then  the  notions  of  above  and  below  can  be  used  for 
describing  tlie  hierarchical  relationships  helwcen  record  types.  For  example,  in 
the  hierarchy  present  in  the  report  in  Figure  1-5,  the  doctor  rein-d  .nr 


t 


hierarchically  above  the  patient  records. 


UNDERSTANDING  DATA  STRUCTURES 

1.4  Malrifpr.,  Micracthio';,  Relafionihipo  and  Notworkr.. 


24 


FREDERICKS 

41 

G.P. 

SMITH 

48 

BOTULISM 

JONES 

22 

APPENDICITIS 

BRriWN 

36 

Ht  ART 

SMITH 

48 

BOTULISM 

SLENDER 

5? 

GEN  SURGERY 

JONES 

22 

APPENDICITIS 

131  UE 

49 

GYNECOLOGY 

Williamson 

31 

MISCARRIAGE 

Mpurn  1-5  Hierarchical  roporf. 


Once  the  oxiotcnce  of  a hierarchy  is  recognized  if  becomes  possible  and 
frequently  useful  to  calculate  staltstics  based  on  a hierarchy.  Examples 
applicable  lo  the  hierarchy  illustrated  in  figure  1-5  are  "number  of  patients  per 
doctor  , average  p.itienl  age  per  doctor",  "maximum  patient  age  per  doctor", 
etc.  These  statistics  all  become  attributes  of  doctor  records;  e g.,  the  avcjrage 
patient  age  attribute  for  doctor  Fredericks  has  a value  of  35.  Notice  that  in  a 
certa.n  sense  these  statistics  migrate  upward  in  the  hierarchy.  The  information 
from  which  they  are  calculated  is  located  within  a set  of  patient  records,  yet  the 
calculated  value  becomes  an  attribute  of  the  doctor  record  that  is  hierarchically 
above  the  patient  record. 

By  introducing  the  concept  of  a universal  record  that  is  above  all  other  records 
in  any  hierarchy,  it  becomes  possible  lo  associate  with  the  universal  record  all 


UNDERSTANDING  DATA  STRUCTURES 

1.4  Malricc'.,  Hierarchic'.,  Rclal-o.iships  and  Networks. 

universal  attributes  curh  as  the  total  number  of  doctors  (which  in  Figure  15 
has  a value  of  four),  the  averape  doctor  ace,  and  the  average  patient  age. 

Attribute',  a'so  migrate  downward.  For  exarnpte,  the  doctor  name  is  ah.o  a 
patient  alltibute;  e.g.,  the  patient  Williamson  has  the  attribute  that  her  doctor  s 
name  is  nine.  The  concept  of  downward  migration  is  an  important  one  to  any 
automatic  system  because  it  reduces  the  amount  of  knowledg?  about  the 
structure  that  a user  needs  to  have.  A system  that  understands  downward 
migration  can  altow  one  user  to  assume  that  the  patient  record  contains  the 
doctor’s  name  while  another  user  can  assume  a hierarchical  relationship. 

It  is  also  possible  to  invert  a hierarchy.  An  example  of  the  result  of  an 
inversion  is  given  in  Figure  1-6,  where  the  hierarchy  of  Figure  1-5  has  been 
inverted.  Notice  that  the  informational  content  of  the  reports  in  Figures  1-5 
and  1-6  is  identical  (assuming  that  patients  and  doctors  are  uniquely  identified 
by  their  names),  yet  tire  usefulness  of  the  reports  (to  a particular  user)  is 
strikingly  different.  Similarly,  the  index  to  a book  or  document  is  an  inversion 
of  the  prose  it  accompanies. 

The  confluent  hierarchy  discussed  in  section  l.(T  is  the  structure  used  in  a 
network  clnta  base  to  simultaneously  store  a hierarchy  in  both  inverted  and 


original  form. 


UNDERSTANDING  DATA  STRUCTURES 

1.4  Matrices,  Hierarchies,  Relationships  and  Networks. 


SMilH 

48 

BOTULISM 

FREDERICKS 

41 

G.P. 

BROWN 

36 

HEART 

JONES 

22- 

APPENDICITIS 

FREDERICKS 

41 

G.P. 

SLENDER 

52 

GEN  SURGERY 

WII  LIAM50N 

31 

MISCARRIAGE 

BLUE 

49 

GYNECOLOGY 

Eigurc  1-6.  Hierarchical  report  (inverted  from  Figure  1-5). 


UNOEF^SlANOtNG  DATA  STRUCTURES 


27 


2 DATA  MANIPUl  AT  ION  LANGUAGE  PROGRAMMER. 

2. 1 Introfliiction. 

This  chapter  will  describe  a computer  program,  called  the  Data  Manipulation 
Language  Programmer  (DMIP),  that  can  generate  a COBOL  Procedure  Division 
corresponding  to  an  information  retrieval  query.  The  target  data  liases  for 
these  queries  must  he  of  the  type  specified  by  the  Data  Base  Task  Group 
(DE3TG)  [CODASYL  1971a]  The  generated  Procedure  Division  contains  the  usual 
COBOL  [CODASYL  1970]  statements  augmented  by  data  manipulation  statements 
as  defined  by  the  DBIG.  The  DMLP  does  not  generate  a Data  Division.  This 
task  has  been  left  to  the  sub-schema  processor  as  defined  by  the  DBTG. 

A Frame  (a  sot  of  formal  logical  rules)  defines  the  DMLP.  An  Automatic 
Programming  Generator  (APG)  [Buchanan  and  Luckham  197A|  Buchanan  197A] 
transtates  this  Frame  to  an  operating  program. 

The  DMtP  uses  two  inputs  to  direct  and  control  program  generation.  One  of 
these  inputs  is  a query  whereby  the  user  specifies  the  information  ctesired. 
The  other  is  a description  of  the  data  base  structure:  The  DMLP  can  program  for 
many  different  data  bases.  An  example  of  a simple  query  and  the  resulting 
procedure  is  given  in  the  next  section. 

2.1.1  Introductory  cxampjp 

Figure  2- 1 ittustrates  the  query  (written  in  the  Hl-IQ  language)  corresponding  to 


UNDERSTANDING  DATA  STRUCTURES 
2.1  Infrocliicfion. 


28 


the  command,  "Please  display  the  order  number,  order  date,  shipping  dale, 
customer  nui^iber  and  customer  name  for  any  order  specified."  This  query  was 
entered  interactively,  and  for  clarity  I have  underlined  the  system  prompts.  (A 
further  explanation  of  Hl-IQ  is  given  in  Section  2A.)  Figure  2-1  also  illustrates 
the  data  structure  diagram  for  the  relevant  portion  of  the  data  base  arv.i  the 
Procedure  Division  that  was  generated  in  response  to  the  query  and  the  data 
base  description. 


UNDFR5T  ANDII IG  I'JAIA  ST  I^UCl  i^lRES 
2,1  InlrotliK  lirtn. 


29 


Inputo;  (qurry) 

n<nn?  program  iiAMr.  di  mo 
RI.AI)  D5K:?  1 

^ ^ PI^IMAR'''  RPCORO  K'ii  MA^Ill 
lORIXR 

COMOIT  IONS  I.OR  RrTRifVAl^ 

♦ (ORD  HO  KQ  I^OHTlMt) 

^Gll 

i I l_MS  OR  STATS  10  [iE  OISPLMEQ 
iol’l)  (K) 

ioRH  HATE 
IIP  GATE 
ICUST-MO 
7cU5T  -NAME 
+M1I 

I’OSSinii:  IJIGlilP 
(ORIPMG) 

SH  I (.1.  gni  OR  lyi’i.  NJl  ORO-HO 


(data  strurUire) 


Oulpul: 


[’ROCinDRE  DIVISION. 

PROCl  SECTION. 

PARA- 1 00. 

OPEN  AREA  MAIMA  . 
mSPI  AY  'GRIT  EX)'  'EQ?'  . 
accept  ord  no  . 

HMD  ORDER  record  . mityt  c^NTFNCE 

ir  ERRORSTATOS  IS  NOT  EQUAL  TO  0 NEXT  S.-NTENCE 

El  SE  PERI  ORM  PARA- 101. 

CIOSE  AREA  MAINA  . 

STOP  . 

PARA  101. 

CE  T ORDER  RF.COITD  . 

E IMD  OWNER  RECORD  OF  ORDERS  E>ET  . 

OE  r CUSTOMF  R RECORD  . 

DISI’I  AV  ORD-NO  ORD-DATE  SHIP-DATE  CUST  NO 
COST -NAME  . 


Fip.iiro  2-1. 


Simple  query  wilh  generated  procedure. 


i 


understanoing  data  structures  30 

2.1  Introduction, 

At  the  end  of  the  query  the  system  has  ir>dicated  that  ORD-NO  is  a possible  port. 
In  this  query,  order  number  qualifies  as  a port  for  two  reasons;  (a)  It  has  been 
defined  as  a c.’>lculated  bey  in  the  data  base  description,  (b)  Order  number  is 
used  in  an  equality  test  in  the  qrery.  It  is  Iticrcfore  possible  to  directly 
retrieve  the  order  record,  given  the  value  that  order  number  must  have. 

The  user  may  accept,  reject  or  select  a port  since  information  necessary  for  this 
decision  is  not  held  by  the  system.  The  generated  program  uses  the  port  to 
enter  the  data  base.  In  other  words,  the  port  is  the  start  of  the  access  path. 

This  query  is  not  complete:  The  particular  orcter  number  for  the  Order  that  is  to 
be  retrieved  has  not  been  specified.  The  keyword  RUNTIME  means  that  a value 
must  be  provifted  when  the  goneraled  program  is  executed.  The  resulting 
program  is  interactive  and  asks  the  user  "ORD-NO  EQ?"  to  comptete  tl>e  query. 
The  FIND  is  then  possible  because  the  calculated  key,  ORD-NO,  has  a value 
through  execution  of  the  ACCEPT  statement.  Following  the  FIND,  ERR0R5TATUS 
is  checked  to  see  if  the  FIND  was  successfully  completed. 

In  PARA- 10 1 the  program  GETs  the  appropriate  items  in  core  and  DISF’LAYs 
them.  Note  that  the  system  has  determined  from  the  data  base  definition  that 
not  all  desired  items  are  contained  in  the  ORDER  record,  in  fact  CUST-NO  and 
CUST-NAMF.  are  contained  in  the  CUSTOMER  record.  By  the  principle  of 
downward  migration,  tlic  system  permits  the  user  to  proceed  as  If  CUST-NO  and 


UNDERSTANDING  DATA  STRUCTURES  <51 

2.1  Introduction. 

CUST-NAME  are  part  of  the  ORDER  record  because  the  system  knows  that 
unique  values  for  these  items  can  be  determined  for  every  ORDER  record 
occurence.  The  constructed  program  contains  a FIND  OWNER  statement  so  that 
these  values  can  be  retrieved  and  displayed. 

e 

2.1.2  Extended  capability. 

The  preceding  example  was  axtremely  simplified  for  expository  purposes. 

Additional  features  of  the  system,  not  brought  out  by  the  example,  are  as 
follows: 


Specificaf.on  of  complex  retrieval  conditions  contair^-'g 
conjunction,  disjunction,  and  universal  and  existei.’  al 
quantification. 

Specification  of  conditional  output  for  exception  reporting. 

Complex  nested  hierarchical  retrieval  and  output  descriptions. 

Calculation  of  totals,  averages,  counts,  minima,  and  maxima. 

Rules  for  efficient  code  generation. 

There  are  many  possible  extensions  to  the  system  which  are  described 
separately  in  Chapter  Four.  These  extensions  serve  as  a good  indication  of  the 
present  limitations  and  shortcomings  of  the  system. 


2.1.3  By-products. 

•The  effort  of  creating  the  DMLP  yielded  some  interesting  by-products: 

The  HI-IQ  query  language  especially  suited  for  network  data 
bases. 


UNDERSTANDING  DATA  STRXTURES 
2.1  Introduction. 


32 


A Backus-Naur  Form  [Naur'  et  al  1960]  description  of  an 
entire  class  of  programs. 

An  Algol-to-COBOL  conversion  program. 

A separate  section  of  this  chapter  will  describe  each  by-product. 

2.1.4  System  overview. 

The  DMLP  is  composed  of  two  major  subprograms  as  illustrated  in  Figure  2-2. 
The  Request  Handier  interacts  with  the  user  as  he  enters  his  query  and  reports 
any  differences  between  the  query  structure  and  the  data  base  structure.  The 
Request  Handler  has  knowledge  of  the  data  base  structure  and  some  general 
knowledge  of  how  the  program  generator  will  function.  It  generates  a set  of 
assertions  describing  the  query.  These  assertions  will  direct  and  constrain  the 
program  generation  phase.  After  an  entire  query  has  been  processed  by  the 
Request  Handler  it  poses  the  goal,  "write  a program",  to  the  Program  Writer. 


Figure  2-2.  System  flows. 


UNDERSTANDING  DATA  STRUCTURES 

2.1  Introduction, 


33 


The  Progr  im  Writer  Attempts  to  satisfy  tlie  goal  posed  by  the  Request  handler 
using: 


F’rogr Ainmmg  techniques  defined  in  the  system, 
A'-serlions  describing  the  target  data  base  structure. 
Assf'itions  (lescrihing  tlie  information  retrieval  query. 


2.1.5  Chapter  outline 

To  aid  tire  reader  in  the  necessary  jumping  around  between  the  sections  so  that 
he  can  'Tiool' trap”  liis  understanding,  an  outline  of  the  sections  is  presented 
below.  Not  essential  to  an  understanding  of  the  system  are  sections  2.3,  2,9 
and  2.10. 


2.1  Introduction. 

Prnsonic,  a cli.ipter  outline,  a simple  example  and  the  top 
level  system  structure. 

2.2  I he  APG  System, 

Describes  the  logical  bas'o  of  the  axiomatic  representation, 
and  the  program  generation  method  in  general. 

2.3  Program  Generation  vs  a Generaliecd  Interpreter. 

A justification  of  the  approach,  is  relatively  independent  of 
tlie  other  sections. 

2.  A HI  10.  I bp  Query  Language. 

Describes  the  primary  input  to  the  syciem. 

2.5  Request  Handler  Assertions, 

Describes  the  internal  description  of  a query,  used  to  direct 
and  constrain  program  generalion. 

2.6  A BNP  Description  ol  the  Generated  Procedure. 

Describes  'he  system  output  in  BacKus-Naur  form,  useful  for 
discussion  of  the  types  of  program  constructs  generated. 


UNDERSIAN[DING  DATA  STRUCTURES  34 

2.1  Infrocluctlon. 


2.7  At.cerlions, 

Dc-'cribcs  ll>p  simpip  Boolean  exprescionc  with  which  more 
complex  slalemenis  of  Hie  logic  can  be  made. 

2.8  A Frame  for  the  Semantics  of  Data  Structures. 

Describes  the  rules,  staled  in  terms  of  the  assertions, 
whereby  the  system  “understands”  data  structure  and  Data 
Manipulation  Language. 

2.9  Efficiency  Considerations. 

Describes  wliat  was  clone,  and  not  done,  tc  insure  generation 
of  efficient  programs. 

2.10  Algol  to  COBOL  Conversion. 

Describes  translation  to  COBOL  from  an  iiternal  Algol-like 
representation  of  the  completed  program. 


2.11  Examples  of  Procedure  Generation. 


UNDEP51 ANDING  DATA  STRIXTUHES 


35 


2.2  ihc  Al'G 

Pi'Oi’ram  r nn'.lruf lion  i'.  tarriocl  out  uoinp,  a domain  inilpppndcnf  automatic 
program  p.pnor.Tlion  ry'-lpin,  hprpaficr  (tpnolcd  by  APG,  reported  in  [Ruchanan 
and  Luckham  1R7/1;  Ruchanan  197AI  The  APG  has  been  extended  in  form  a'-, 
well  ac.  contmil  for  Iho  purpooec-  of  this  thesis.  To  cketch  the  logical  bac.ic.  of 
the  APCi,  I review  '.ome  plemenis  of  the  logic  ol  programs  and  show  how  the 
descriptive  formalism  for  APG  (called  a Frame)  is  formulated  and  used  in 
program  generation.  Sections  2.2.1  and  2.2.2  have  been  condensed  directly 
from  the  original  reports  [Ruchanan  and  Luckham  197^;  Buchanan  1974;  Igarar-hi, 
London  and  luckham  1973;Hoare  1969]. 

2.2.1  LpFi'L  of  Prqf/jM'tl. 

Statements  of  lue  logic  are  ol  Ihe  form  P|A)Q  where  P,Q  aie  Boolean 
expressions  (ollen  called  assertions)  and  A is  a program  or  program  part. 
P{A}Q  means  "if  P is  true  of  the  input  state  and  A halts  then  Q is  true  of  the 
output  slate". 

A rule  of  inference  is  a transformation  rule  from  Ihe  conjunction  of  a set  of 
statements  (premises,  say  HI  ,...,Hn  ) to  a statement  (conclusion,  ^ay  K).  Such 
rules  are  dpnolcd  by 

HI  Iln 

K 


UNDE'r^STAN[)ING  GAT  A STRUCTURES 
2.2  The  Al’G  Systpm. 


36 


2.2.2  Eramcf. 

The  rulcG  iti  a frame  F are  of  three  kinds; 

PROCFUIJRES  transform  states  into  states  and  are  expressed  as  statements 
in  the  lop.ic  of  programs. 

SCUFMES  arc  methods  for  constructing  programs  and  are  expressed  as 
rules  of  inference  in  the  togic  of  programs. 

RELATIONAL  LAWS:  definitions  and  axioms  which  hold  in  all  states  and 
serve  to  "complete"  ii>complete  state  descriptions  by  permitting  first  order 
deduction  of  other  elements  of  a state  from  those  given. 

A problem  for  program  construction  may  be  stated  as  a pair  where  I is  an 

input  assertion  (or  initial  state)  and  G is  the  output  assertion  (or  goal  that  must 
be  true  in  the  outsut  state).  The  program  construction  task  is  to  construct  a 
program  A such  that  I(A}I\  where  F^G.  A solution  is  the  sequence  of  rules  of  F 
used  in  the  construction  of  the  solution  program  A. 


Notation;  Substitutions,  denoted  by  c/,  do  not  replace  any  variable  that  occurs  in 
the  initial  state  1.  Expressions,  all  of  whose  variables  occur  in  the  initial  state 
are  called  "fully  instantiated".  |-  denotes  a first  order  deduction  using  F and 
the  standi.rd  rules  described  below. 

Standard  rules:  A set  of  rules  representing  standard  programming  knowledge 
are  implemented  in  the  program  construction  methods  of  the  problem  solving 


UNDERSTANDING  DATA  STRUCTURES  3' 

2.2  The  APG  System. 

algorithm: 

RO.  Assignment  Axiom:  P(t){x«-t}P(x) 

Rl.  Rule  of  COi .sequence:  P=Q,Q{A}R  P{A}0,Q=R 

P{A)R  P{A}R 

R2.  Rule  of  Composition:  P{A}Q,Q{B}R 

P{A;B}R 

R3.  Rule  of  Invariance:  if  P{A}Q  and  I |-  P then  I{A}Qur 
where  I*  is  the  largest  subset  of  I consistent  with  Q. 

R4.  Change  of  Variables:  P(x){A(x)}Q(x) 

P(y){A(y)}CKy) 

R5.  Conditional  Rule:  PaQ{A}R,  Pa-«Q{B}R 

P{IF  Q THEN  A ELSE  B}R 

Frame  rules:  A Frame  definv<»s  a programming  environment  usi.ig  the  rules 
described  below.  These  rules  are  used  in  conjunction  with  the  standard  rules 
to  generate  programs. 


51.  Primitive  procedures  (or  operators):  the  rule  defining 
procedure  p is  of  the  form  P{p}Q.  The  assertions  P and  Q 
are  the  pre-  and  post-conditions  of  p.  p must  contain  a 
procedure  name  and  parameter  list. 

52.  Iterative  rules:  an  iterative  rule  definition  containing  the 
Boolean  expressions  P(basis),  Qdoop  invariant),  Rfiteration 
step  goal),  Ucontrol  test)  and  G(rule  goal)  is  a rule  of 
inference  of  the  form: 


P . I-  Q,  QaL{?}R,  R{??}Qv-L 
P{while  L do  ?:??}G 


LINDERS'^ AMDING  DATA  STRUCTURES 
2.2  The  APG  System. 


38 


S3.  Dcfinilions;  A definition  of  G in  fermo  of  P is  a logical 
equivalence  |-  P^^G. 

SA.  Axioms:  A frame  axiom  P is  a logical  axiom  |-  P. 

The  Frame  is  compiled  by  the  APG  to  form  the  DMLP.  Each  rule  in  the  Frame 
results  in  a Micro-Planner  theorem.  Such  Micro-Planner  constructs  are  not 
actually  theorems,  but  that  terminology  will  be  adhered  to  because  of  historical 
precedent 

Each  compiled  throrem  contains  premises  (or  a pre-cond'tion)  and  conclusions 
(or  a post-condition).  For  example,  the  Micro-Planner  theorem  corresponding  to 
the ‘rule  P{A;R  has  a theorem  body  for  the  pre-condition  P and  a calling  p attern 
for  the  post  condition  R. 

2.2.3  Program  generation. 

The  theorems  are  used  in  a recursive  subgoaling  procedure  to  generate 
information  retrieval  programs.  The  recursive  procedure  first  builds  a plan  for 
the  target  program  in  depth-first  fashion.  The  plan  is  a tree  and  the  branches 
from  a node  correspond  to  the  subgoals  spawned  at  that  node.  For  example,  if 
the  current  goal  is  R,  and  R Is  not  directly  true  in  the  current  state,  then  'ho 
system  examines  the  set  of  theorems  and  selects  one  which  has  a 
post-condition,  say  Q,  that  matches  the  current  goal,  i.e.  RcQoC.  for  some 
substitution  Q'.c  may  not  be  a fully  bound  formula,  but  a complete  binding 
will  be  constructed  during  the  generation  process. 


LJNDCR51  ANDirK'.  DATA  STRUCUmFS 
22  The  Al’(^  Syotr-.,',. 


39 


If  ffio  rulf’  iiv.t<<nc('  r’"'|A}Q  ''  «<(liicve('  R ao  al)Ovc,  then  P / become*;  tbc  cunont 
goat.  If  IjHiP'-''/’  (I  I'.,  the  current  c.tate),  then  by  the  rule  of  compoc.ition 

I anct  hy  the  i tile  of  consequence,  i{n;A)R.  The  system  finds  B;A  as 
the  program  to  acliieve  ff  from  tire  initial  'late  I. 

Tiip  SI itii’oaling  proce*s  cloe^j  not  usually  distinguish,  except  as  noted  lielow, 
between  tlie  types  (St  llirotigli  Sd)  of  Frame  rules.  The  result  is  that  all  rules 
are  scanned  for  a post -condition  matching  liic  current  goal.  The  subgoaling 
process  doe-,  clistinguish  helween  rules  of  typo  S'!  and  other  rules  in  that  only 
Sd  rules  nr  l!ie  *et  of  assertions  can  be  used  to  prove  the  pre-conditions  of  an 
rule.  Rules  of  type  53  and  5A  are  distinguished  from  the  other  rules  in  that 
they  cannot  change  the  set  of  assertions.  This  is  a consistent  distinction. 
Since  only  progr.im  segments  can  effect  changes  in  the  environment  (when 
execut('cl);  onty  rules  describing  their  effect  should  be  allowed  to  change  Hie 
state  description. 

Rules  may  he  '■pecified  having  a pre-condition  which  matches  an  assertion  in  its 
post-condilion.  Such  a rule  may  be  recursive.  If  it  is  not  recursive  then  it  may 
not  be  used  to  satisfy  its  own  pre-condition. 

With  the  APG,  fluchanan  also  introduces  an  interesting  improvement  to  the  logic 
of  programs  by  in'roduemg  uncertainty.  Tlie  value  of  an  assertion  can  tie  TRUE, 
FALSE  or  UNCERTAIN.  This  uncertain  logic  recognizes  that  there  exist 


UNDERSTANDING  DATA  STRIX:TURES 
2.2  The  APG  Syoterr). 


AO 


> 


» 


0 


assertions  whicli  can  only  be  meaningfully  tested  during  execution  of  the 
generated  program.  Use  of  a rule  containing  an  uncertain  assertion  in  its 
pre-condition  will  result  in  the  gerieration  of  a conclilional  procedure. 

Assertions  describe  the  pre-  and  post-  conditions  of  the  rules.  Assertions  also 
describe  the  current  state.  The  58  different  assertions  used  in  the  data  base 
programming  environmen‘  are  described  in  section  2.7, 

2.2.4  APG  exten-'ion. 

The  APG  constructs  a program  of  the  form  P{AjQ  if  Q contains  a complete  and 
specific  description  of  the  goal.  This  is  a correct  but  quite  cumbersome 
approach  for  the  data  base  retrieval  programming  problem  because  the  goal 
statement  is  quite  complex.  Increased  complexity  in  the  goal  statement  results 
in  increased  complexity  in  sub-goals  so  that  the  search  tree  becomes  quite 
large. 

The  APG  paradigm  has  been  modified  for  the  data  base  application  to  permit 
specification  of  most  of  the  goal  in  P!  That  is,  the  goal  Q is  the  general  "write  a 
program  and  P contains  the  specific  constraints  on  the  information  set  to  be 
retrieved  by  the  goal  program.  In  other  words,  the  query  becomes  part  of  the 
programming  environment. 

This  allows  selection  of  specific  sub-goals  from  the  set  of  assertions  (P)  by 


1 


UNDtt^STANOINCJ  DATA  SIHUCTIIRF';  qi 

2.2  The  Af’G  Systetn. 

fho'jG  rulc'Ti  which  arc  c.pettfiratly  directed  by  the  r.ubgoal  in  queotion.  Filtering 
flic  goal  to  the  apfiropr iale  rules  in  this  manner  reduces  the  "baggage"  carried 
by  the  suh  goaler  and  drairiatically  reduce^  Iruiticss  search:  On  averag.e  only 
1A7  of  the  rules  Iried  are  unnecccssary. 


UNDERSTANDING  DATA  STRIX:TUF?ES 


A2 


2.3  Prop.r am  Gmoratton  ^ a Goncrahzcd  Interpreter. 

The  terms  compilation,  interpretation  and  automatic  program  generation  are 

related,  but  I differentiate  between  them  as  follows: 

Compjjation  - t’'anslation  of  an  entire  program  from  high 
level  procedure  to  low  level  procedure. 

Aufqmalic  prop/ am  p.eneration  - translation  from  a 
non-procedural  description  of  a program  to  procedure. 

- statement  by  statement  translation  and 
execution  of  procedural  commands  or  non-procedural 
descriptions. 


Rather  than  generating  reusable  procedure,  interpreters  generate  immediate 
results.  In  a production  environment,  the  classical  break  even  analysis  shows 
that  procedure  generators  have  a cost  advantage  if  expected  usage  exceeds  the 
break  even  point  (Figure  2-3), 


USAGE 


Figure  2-3. 


Average  coot  per  usage  for  generated  and 
interpreted  programs  in  a production  environment. 


UNDt-HSiANoinr,  t)AiA  simix:tupfs 

2.3  Progr.im  Gnncr,ilion  vo  a Gencralircr)  inicrprctcr. 


A3 


A break  even  analysis  ir  imporlant  m fbe  ctioicc  of  selecting  a vcbidf'  (or  a 
particular  applic  al ion,  but  in  a research  environment  this  is  not  the  lype  of 
choice  being  made.  Insleacl,  researchers  shoiilcJ  attempt  to  provide  both  so  lhat 
the  choice  is  available  to  users, 

Procedure  generators,  are  data  free  This  permits  partial  debugging  of  a system 
without  a flata  tjase.  To  do  the  Srt.tie  with  an  interpreter  would  require  Ihe 
construction  of  a dala  snmilator  for  the  interpreter 

Data  froeness  was  an  imporlant  reason  for  building  the  DMLP  as  an  aukTiatic 
programming  system.  As  a matter  of  fact,  r>o  data  bases  have  been  built  with 
which  to  lest  the  generated  procedures.  1 expect  that  some  generated 
programs  will  probably  conlain  errors  because  this  type  of  testing  has  not  been 
done,  but  this  Ifrcsls  stands  as  a demonstration  of  how  far  the  work  could 
proceed  without  data  bases. 

Other  reasons  for  choosing  the  APG  were: 

The  formalism  discussed  in  section  2.2  - helps  break  down 
the  problem  and  provides  a framework  for  (fiscussion. 

The  genei  atron  of  programs  as  a product  - provid'‘s  feedback 
to  Hie  researcher  on  the  operation  of  the  DMLP  and  can  be 
used  to  discuss  results. 

The  second  advantage  is  not  ma)or  because  the  same  can  be  accomplished  with 
an  interpreter  if  it  has  an  internal  trace. 


UNDLRSTAND'.NG  DATA  STRUCTURtS 

2.3  Program  G<,nerafior>  V5  a Generalized  Interpreter. 

In  a preceding  paragraph  1 stated  that  researchers  should  build  both 
interpreters  and  procedure  generators  so  that  the  best  method  can  be  selected 
in  the  production  en  'ironment.  My  experience  with  the  DMIP  as  a procedure 
genera'or  will  aid  in  building  such  an  interpreter:  Section  2.6  presents  a DNF 
description  (or  all  programs  corresponding  to  HI-IQ  queries. 


UNDERSTArJOinr.  DATA  STRUCTURfS 


Af) 


2. A tJI-IQ,  jlic  Qwofy  l'in(;u.''gc_. 

In  Clia()l<  r Ont>  I '■lated  that  people  tend  Ic  communicate  intormalinn  iti 
hierarchical,  maln>  or  comhined  hierarchical-malriv  formal,  even  if  wc  are  av/aic? 
of  the  niullf  climeMMonalily  of  fie  data.  IjMially  wc  coin’nunicale  a particular 
piece  of  information  within  tlic  context  of  only  a few  relations  that  it  ie.i>  ha''o 
to  oilier  infurmation.  This  is  so  hecau'.e  speech,  hearing,  writing  and  reading 
(but  perhaps  not  vision)  are  essentially  serial  (single  channel,  one  dimensional) 
processes. 

To  comnuinicatc  information  of  more  than  one  dimension  through  a single 
channel  requires  the  use  of  markers  and  codes  to  establish  dimensionality 
contexts.  After  encountering  s.everal  context  changing  markers  in  a 
communication,  people  soon  (ind  it  difficult  to  Keep  several  "trains  of  Ihoiighl" 
(note  the  linearity  implicit  in  this  idiom)  going  at  once. 

Hl-IQ  (HJerarchical  Interactive  Query)  language  is  designed  for  the  specification 
of  nicrarchical-matrix  reports.  Other  design  goals  for  HI-IQ  were  flexibility  in 
specifying  hierarchical  contexts,  and  the  ability  to  specify  the  calculation  of 
statistics.  1 also  wanled  to  make  the  system  usable  for  people  with  minimal 
knowiccig.e  of  data  base  structure.  This  meant  that  the  HI-IQ  processor  had  to 
understand  some  of  the  concepts  discussed  in  Chapter  1 such  as  downward 
attribute  migration  and  confluent  hierarchies. 


UNDERSTANDING  DATA  STRUCTURES 
2. A Hl-IQ,  the  Query 


^6 


Becaiiic  of  the  preponderance  of  hierarchies  in  report  structure,  statistical 
calculation,  and  logical  quantification,  it  seemed  only  natural  to  give  Hl-IQ  a 
hierarchical  structure.  Tlie  hierarchical  query  structure  is  reinforced  visually 
for  the  user  by  further  indentations  of  system  prompts  for  every  hierarchical 
level  referenced. 

A Hl-IQ  query  contains  one  or  more  hierarchical  levels.  Each  level  is  used  to 
specify  a malny  m the  output  report,  to  specify  the  calculation  of  a statistic,  or 
to  chock  the  truth  value  of  a quantified  condition.  A simple  one  level  quei  y 
results  in  a report  consisting  of  a single  matrix  which  contains  no  statistics. 
Such  a query  is  illustrated  in  Figure  2-1 

It  is  not  unirual  for  the  definition  of  a particular  level  to  be  interrupted  by  the 
definitions  of  lower  levels.  If  the  user  desires  the  calcu'ation  of  a particular 
statistic,  say  an  average,  tfien  the  system  next  asks  him  to  define  the  calculation 
of  that  a''erago  before  proceeding  with  the  further  specification  of  the  level  in 
which  the  average  svas  requested.  The  prompt  indentation  indicates  to  the  user 
which  hierarcliica'  level  he  is  currently  in. 

Figure  2-d  presents  a BNF  description  of  Hl-iQ.  S'rice  Hl-IQ  is  an  interactive 
language,  only  portions  ei  a query  are  typed  by  the  user.  To  distinguish  such 
entries  from  the  characters  typed  by  the  system,  all  system  typed  characters 
have  beeniiinderlined. 


UNDER5TAMniN(i  DATA  STRUCTURES 
2.4  Hl-IO,  llic  Qi'Pi  y 


07 


Bnckuo-Naur  Form  (UNO  [Naur  cl  al  19S0)  is  a formalisrr>  invented  for  llio 
description  of  prORramming  lanp,uap,cs.  specifically  the  grammatical  structure 
(syntax  rallmr  than  semantics)  of  those  programming  languages.  UNf  can  at  o 
be  used  to  describe  non  programming  languages,  ich  as  a restricted  suhrel  of 
English.  An  excellent  description  aid  illustration  of  BNF  can  be  found  m 
[McK’eman  et  al  1970]. 


The  sequence  of  prompts  for  a particular  hierarchical  level  always  ■ onsisis  of 
three  subsequences.  These  three  subsequences  are  <Reco,J  nane-, 


<C0NDITI0N  LINES>  and  <ITEM  1 1NES>. 


UNDERSTANDING  DATA  STRUCTURES 
2.4  HI -IQ,  the  Query  Language. 


48 


<QUERY> 

<LEVEL> 

<LEVEL> 

PRIMi\RY  RECORD  FOR  (^COMMAND-- 1 
r <Rccotd  natno> 

CONOn  IONS  FOR  RETRIEVAL 
•■CONDI  ION  LINES> 

ITEMS  OR  stats  <M0DIFIER> 

<ITEM  LINES> 

<CONDITiON  LINES> 

t NIL  1 

<C0NDITI0N  LINE>  <CONDinON  LINES> 

<CONDITION  LINE> 

jj* 

♦OR  1 lALL  <LEVEL>  | 
♦ANY  <LEVEL>  1 i<TEST> 

<TEST> 

■ 

(<IOC>  <REL>  <IOCR>)  1 
(<IOC>  -^REL>  <STAT>)  <LEV£L>  | 

(<STAT>  <REL>  <I(X:R>)  <LEVEL>  I 
(<STAT>  <REL>  <STAT>)  <LEVEL  <LEVEL> 

<IOCR> 

* m 

RUNTIME  1 <IOC> 

<IOC> 

<Item  name>  | <Constant> 

<REL> 

IE  1 LT  1 GE  1 GT  1 EQ  1 NE 

<ITEM  LINE5> 

♦NIL  1 

<ITEM  LINE>  <ITEM  LINES> 

<ITEM  IINE> 

♦<IOC>  1 i<STAT>  <LEVEL>  | 
♦REPEAT  <LEVEL>  | ♦ONE  <LEVEL>  | 
♦COND  <TES T> 

<COMMAND> 

JJS 

main  I ALL  1 ANY  | 
REPEAT  1 ONE  | <STAT> 

<STAT> 

COUNT  1 TOT  ' AVE  1 
MIN  1 MAX 

<modifilr>  , 

TO  BE  DISPLAYED  | FOR  AVE  | 
FOR  TOT  1 FOR  MIN  | FOR  MAX 

Figure  2-4. 


BNF  for  the  HI-IQ  language. 
(System  prompts  are  underlined.) 


UNDERSTANDING  DATA  STRUCTURES 
2A  Hl-IQ,  ll'f!  0"Rfy  Language. 


09 


2.0.1  <Rerorcl  name>  siih' equencr. 

The  <Rccorcl  namc>  t.ubroqucncc  coni.it.t&  of  a single  system  prompt  and  user 
reply  whotcin  the  ircr  must  name  the  context  record  for  the  current 
hierarchical  level.  It  is  possible  to  have  the  system  determine  the  context 
record  for  a particular  level  from  the  other  two  prompt  subsequences.  This 
would  further  reduce  the  knowtectge  that  Ihe  user  must  have  to  use  the  system, 
but  it  would  also  increase  Ihe  possibility  of  undetccled  errors  because  of  Ihe 
loss  of  redundancy. 

2 0.2  :^CON[)n  ION  subsequence 

The  CONDITION  LINES>  sub-sequence  of  prompts  in  a query  level  is  of 
indefinite  length.  This  prompt  sequence  defines  the  condition  that  must  bn  true 
to  r-trieve  the  context  record.  The  condition  is  specified  using  the  logical 
connectives  AND  and  OR  and  a set  of  tests  in  a disjunctive  form.  That  is  to  say, 
if  A,  D,  C,  and  D are  all  tests,  AaBvCaD  is  equivalent  to  (AaB)v(CaD).  However, 
the  latter' specification  is  not  atlowed;  the  user  cannot  control  the  bindings  of 
the  logical  connectives,  AND  and  OR.  This  is  not  a major  restriction.  Any 
condition  can  Ire  specified  In  disjunctive  form,  albeit  in  a cumbersome  way. 
Because  of  the  immediate  binding  of  AND,  it  is  the  default  connector  and  need 
not  be  specified  by  the  user. 

Particular  tests  are  of  the  form  (A  REL  D).  REL  can  have  one  of  six  values.  E0> 
NE,  LT,  LE,  GT,  GE.  "A"  can  be  an  item  name,  a statistic,  or  a numeric  or 


UNDEf^STANDIMG  DATA  STRUCTURES 
2. A HI  IQ,  the  Query  Language. 


50 


non-numeric  literal.  "B"  can  be  any  of  these;  in  addition  "B”  can  be  the 
keyword  RUNTIME.  The  use  of  RUNTIME  signals  that  the  generated  program  will 
be  an  interactive  program.  If  execution  of  the  generated  program  becomes 
dependent  on  an  actual  value  for  ''13",  it  (the  generated  program)  will  prompt  the 
user  with  "A  REL?",  and  the  user’s  reply  will  be  used  to  determine  the  truth 
value  of  (A  REL  0). 

Universal  or  existential  quantification  can  be  specified  as  part  of  a condition. 
Since  quantification  is  only  meaningful  over  a set  of  possible  values,  the  user 
must  be  ready  to  define  a new  hierarchical  level  for  every  quantifier  specified. 
After  encountering  either  of  the  quantifiers  ALL  or  ANY,  the  system 
automatically  proceeds  to  prompting  for  the  definition  of  a new  hierarchical 
level. 

The  system  also  proceeds  to  a new  hierarchical  level  whenever  the  user 
specifies  a statistic  so  that  the  calculation  of  the  statistic  can  be  defined.  A 
statistic  is  specified  with  one  of  the  following  keywords:  COUNT,  TOT,  AVE,  MIN, 


and  MAX. 


UNDERSTANDING  DATA  STRUCTURES 
2.4  Hl-IQ,  IIk'  Query  LontuiaGC' 


51 


C0r-l[)l1  IONS  FOR  RETRIEVE 
t (PAT NO  EO  PDNTIMI  ) 
♦(PATAGC  LT  25) 
iNIL 


coriDjTjoNS  LQE  retrieval 
•(SALARY  11  6000) 

•(SALARY  IT  10000) 

• ANY 

PRIMARY  WaiW) 
•DEPENDENT 

COMDITOMS  L®  RETRIEVAL 
♦’(AGE  LT  21) 

♦NIL 

♦NIL 


Figure  2 -5.  Two  examples  of  retrieval  conditions. 


Figure  ? i.  ilUrl.eler  two  relricvel  roeU.lio,,...  the  (i-rl  ir.  e r.mplr  couioochoo 
ol  two  lr-,lv  lire  !.e<oncl  coodition  indicate!.  Iliat  a record  (employee)  slioiilti  bc! 
retrieved  it  the  employee  ha=  a ■,.dary  below  S6.000,  o-  i(  he  has  a oalary  below 
110,000  and  at  least  one  dependent  child. 


2.4.3  iLl EM  I INI  S>  suhsenucncc. 

The  <ITEM  I.INES^  subsequence  is  also  of  indefinite  length  and  is  used  to  define 
the  matrix  associated  with  the  current  hierarchical  level.  This  matr.x  is  either 
an  output  matrix  (lor  the  report)  or  a statistical  matrix,  depending  on  the 

the  current  hierarchical  level.  The  system  calculates 


command  which  invoked 


UNUERSTANOINn  DATA  STRUCTURES 
2. A Hl-IO,  tlie  Query  Language. 


52 


the  statistic  in  each  column  of  a statistical  matrix,  In  other  words,  every  column 
is  totalled  or  averaged,  Or  the  minimum  or  maximum  is  found  in  every  column  of 
the  statistiral  matrix, 

The  reply  to  a prompt  in  the  <ITEM  LINES'*  subsequence  must  be  any  one  of  the 
statistical  commands.,  an  item  name,  the  REPEAT  command,  the  ONE  command,  the 
CONO  tcmmand  or  NIL,  NIL  terminates  the  subsequence. 


Entering  an  item  name  or  COUNT  defines  a column  of  the  matrix.  Entering  any 
other  statistic  will  define  one  or  more  columns  of  the  matrix  depending  in  turn 
on  the  number  of  columns  in  the  matrix  defined  for  that  particular  statistic. 
This  corresponds  to  the  upward  migration  of  statistical  values  that  I discussed  in 
Chapter  One, 

Any  of  the  commands  (with  the  exception  of  the  COND  command)  will  cause  the 
system  to  initiate  a new  hierarchical  level,  so  that  the  user  can  further  define 
the  action  associated  with  the  command.  The  REPEAT  command  is  used  for 
generating  hierarchical  reports.  Figures  2-6  and  2-7  illustrate  a comptete 


query  and  the  report  it  defines. 


UNDERSTANDING  DATA  STRUCTURES 
2.^  HI-IQ,  the  OiiPiy  Unegtiafip. 


53 


PRIKWIY  F^ECORD(MAIN) 
l(X)C10R 

CONDITIONS  FOR  RETRIEVAL 
iNIL 

[tLMS  or  STATS  IQ  BE  DISPLAYED 

♦ DOCNAME 
tOOCAGE 
♦SPECIALTY 
^RLPEAl 

P!iIt:d^'^X  PECORD  (RFPEAT) 
♦PATIENT 

C()ND!T10NS  FOR  RETRIEVAL 
♦(PAT AGE  GT  21) 

♦Nil. 

ITEMS  OR  STATS  LQ  PI  DISPLAYED 

♦PAT  NAME 

♦PAT  AGE 

♦DIAGNOSIS 

♦NIL 

♦ NIL 


Figure  2-6. 


Query  (PI)  (or  the  report  of  FiBure  2-7. 


UNDERSTANDING  DATA  STRUCTURES  5A 

2.4  Hl-IQ,  Ihe  Query  Language. 


FREDERICKS 

41 

GP. 

SMITH 

48 

DOTULISM 

JfTNES 

22 

APPENDICITIS 

DROWN 

36 

INTERNIST 

SMITH 

48 

DOTULISM 

SL'NDER 

52 

GEN  SURGERY 

.ONES 

22 

APPENDiClIIS 

GLUE 

49 

GYNECOLOGY 

WILLIAMSON 

31 

MISCARRIAGE 

Figure  2-7  A hierarchical-matrix  report. 


The  lop  lev -I  matrix  of  this  report  contains  three  columns  for  DOCNAME, 
DOCAGE  and  SPECIALTY.  The  secondary  matrix,  hierarchically  nested  in  the  lop 
level  matrix,  also  contains  three  columns,  PATNAME,  PATAGE  and  DIAGNOSIS. 

Note  that  in  Figure  2-6,  the  prompt  sequence  tor  both  levels  of  the  report 
included  all  three  sub-sequences  as  detined  earlier.  This  query  specities  a 
condition  in  Ihe  second  level  (on  the  retrieval  ot  patients).  This  condition  will 
not  affect  the  retrieval  of  DOCTOR  records  or  any  other  records  not  within  the 
context  of  Ihe  PATIENT  record.  The  condition  (PATAGE  GT  21)  applies  only  to 
this  particular  context  of  the  PATIENT  record.  The  PATIENT  record  could  have 
been  referenced  elsewhere  in  the  query,  and  the  condition  (PATAGE  GT  21) 


would  not  have  applied. 


UNDERSTANDING  DATA  STRUCTURES 
2. A HI-IQ,  the*  Query  I anp,uage. 


55 


The  function  of  the  ONE  command  ir.  very  ctmilar  to  the  REPEAT  command  oyeepi 
that  only  the  Im«;I  line  of  the  matrix  at  the  next  level  will  be  retrieved  and 
dioplayeci  in  the  report.  If  the  REPEAT  command  in  Fij;ure  2-6  is  replaced  with 
a ONE  command,  then  the  resultinG  report  would  resemble  Figure  2-7  with  Ihe 
exception  cjf  the  third  line  (which  would  not  be  included). 

The  user  can  control  Ihe  appearance  of  particular  attribute  values  on  a 
particular  lirie  with  the  condition.!'  output  (COND)  command.  It  is  especially 
useful  for  exceplion  reporting.  Subsequenl  to  encountering  Ihe  COND  command, 
Ihe  system  responds  as  if  a new  hierarchical  level  had  been  specified,  except 
that  Ihe  first  prompt  sub-sequence  is  skipped.  The  first  sub-sequence  is  not 
necessary  because  COND  cannot  change  the  record  context. 

There  are  a few  other  cases  in  which  the  full  prompting  sequence  is  not 
applicable,  anri  other  prompting  sub-sequences  will  occasionally  be  suppressed. 
The  third  sub  sequence  is  not  entered  for  the  COUNT  command  because 
counting  applies  only  to  line  occurrences  of  a matrix,  Ihe  columns  of  the  matrix 
do  not  affect  it. 

Similarly,  it  does  not  make  sense  to  specify  a matrix  within  the  context  of  a 
condition  qu.Tntificr  (ALL  or  ANY),  so  again  Ihe  third  prompt  sub-sequence  is  not 


entered  by  the  system. 


UNDERSTANDING  DATA  STRUCTURES 
2. A HI-IQ,  the  Query  Language, 


56 


2. A, A PojU  ocicction. 

When  the  query  has  been  completed  the  system  tells  the  user  which  items,  if 
any,  might  be  used  for  a calculated  direct  access  The  user  is  then  asked  to 
select  one  of  these  items  or  none.  Proper  selection  can  reduce  searches 
through  the  data  base.  For  an  item  to  qualify  for  use  as  a calculated  key  it 
must  satisfy  all  of  several  restrictions: 

a)  It  must  be  defined  as  a calculated  key. 

b)  It  must  have  been  used  in  a test  with  an  equality 
relation. 

c)  If  used  in  a test  within  a disjunct,  then  it  must  have  been 
used  in  a test  within  every  disjunct  in  that  condition,  and 
every  such  toot  must  satisfy  condition  b. 

The  need  for  the  first  two  restrictions  is  obvious.  The  third  restriction  is 
necessary  because  use  of  an  item  as  a calculated  key  which  meets  the  first  two 
but  not  the  third  restriction  cannot  result  in  improved  efficiency  of  the 
generated  program.  This  is  so  since  all  disjuncts  not  containing  the  necessary 
test  will  cause  an  unlimited  search  anyway.  An  improvement  for  a more 
sophisticated  system  would  be  to  relax  restriction  c to  require  a calculated  key 
in  every  disjunct,  but  not  necessarily  the  same  calculated  key. 


UNDERSTAflDIfJG  DATA  STRUCTURES 


57 


2.5  UdflfljS!! 

The  Rcqiic't  H.indler  dc'crtbrs  the  query  with  a set  o(  assertions,  An  assertion 
must  conform  to  onr  of  the  templates  in  Figure  P-8. 

Tliere  will  be  at  most  one  ISRORT  assertion  for  a particular  query.  This  is 
asserterl  only  if  RFC  satisfies  all  criteria  lis'ed  in  section  2.M,  and  if  if  has  been 
selected  by  the  user  for  use  as  a calculated  key. 

There  is  cvattly  one  TODEOPND  assertion  per  query.  The  AREAS  list  indicates 
which  areas  nf  the  dala  base  contain  records  that  will  be  accessed  during  the 
processing  associaled  with  the  query.  Determination  of  the  AREAS  list  is  no! 
simply  accomplished  by  tallying  the  names  of  the  areas  that  contain  the  records 
referenced  m the  query.  If  is  possible  that  Ihc  generated  program  will  access 
areas  not  directly  referenced  via  record  names  in  the  query.  This  situation 
occurs  if  an  access  path  between  two  records  passes  through  an  imermecliafe 
record.  Detei  mmation  of  the  AREAS  list  therefore  Involves  a determination  of 
all  access  paths. 


UNDERSTArJOING  DATA  STRUCTURES 
2.5  Request  H.indlcr  Asscrtious, 


58 


ISPORT(REC) 

Use  REC  as  a port 

TOUEO(’ND(AREAS) 

AREAS  10  a hot  of  all  areas  contaifring  records  which 
may  he  acccvoccJ  in  the  query. 

L INKS(TYPE.REC  1 ,RFC2, LEVEL) 

LEVEL  10  a Dewey -cfccimal  icJentification  of  a query 
level  This  level  has  REC2  as  the  context  record  and 
was  entered  with  the  command  indicated  by  TYPE 
from  a level  which  had  RECl  as  the  context  record. 


FORfCOND, LEVEL) 

COND  specifies  the  condition  whose  truth  must  be 
established  prior  to  any  processing  of  the  matrix. 


TODrUSED(ITEMS.LEVEL) 

ITEMS  identifies  the  columns  in  the  matrix  for  LEVEL. 


ISVAR(VAR) 

VAR  is  a system  generated  variable. 


Figure  2-8.  Templates  for  Request  Handler  assertions. 


LINKS,  FOR  and  TODEUSED  are  each  asserted  at  most  once  for  every  level 
specified  in  f'.o  query.  There  is  a one  to  one  correspondence  between  these 
three  and  the  three  subsequences  of  prompting.  The  LINKS  assertion  defines 
the  context  of  a query  level  and  assigns  the  LEVEL  identifier.  The  FOR 
assertion  defines  the  retrieval  condition,  and  the  TOBEUSED  assertion  defines 
the  matrix. 

A Dewey-decimal  scheme  is  used  to  identify  the  levels  of  the  query.  The  top 


UNDERSTANDING  DATA  STRUCTURES 
2.5  Request  Handler  Assertions. 


59 


level  is  indentified  as  X.  The  first  level  occuring  within  the  context  of  the  top 
level  is  identified  as  X.l.  X.2.1  identifies  the  firs‘  hierarchy  in  the  second 
hierarchy  occuring  within  the  top  level  of  the  query. 

2.5.1  LINKS  assertion. 

The  TYPE  parameter  of  LINKS  is  actually  a list  containing  two  sub-parameters. 
Values  of  the  first  sub-parameter  are  limited  to  the  names  given  in  Figure  2-9. 
With  the  exception  of  MAIN,  these  are  all  commands  which  invoke  new  query 
levels.  MAIN  is  used  to  identify  the  top  level  of  the  query  and  has  the  same 
interpretation  as  the  REPEAT  command. 


The  second  TYPE  sub-parameter  is  a unique  system-created  variable  name. 
This  variable  is  used  for  counting  record  occurences  if  the  first  sub-parameter 
is  COUNT  or  AVE,  or  for  controlling  quantification  if  the  first  sub-parameter  is 
ONE,  ALL  or  ANY.  Although  TYPE  will  always  contain  a variable  name  as  its 
.second  parameter,  this  variable  is  only  used  by  the  Program  Writer  if  the  first 
parameter  in  TYPE  is  one  of  the  five  commands  indicated  above. 


MAIN  all 

REPEAT  ANY 

ONE  COUNT 


TOT 

AVE 

MIN 

MAX 


Figure  2-9. 


Possible  values  of  the  TYPE  parameter  in  LINKS. 


UNDERS  F ANDIf>iG  DATA  STRUCTURES 
2.5  Rcqucof  Handler  A'.rcrfions. 


60 


2.5.2  FOR  ar.>;crlion. 

The  CO®  0,  tor  Is  a ,1s,  con,„„,„e  a„  o,  ,hc  ,es,s  sped, led  in  ,he 

C0W,T,0N5  FOR  RCREVAL  a pa„,cp,as  A ,es,  ,s  desedbed  In 

» sub-, is,  <on„lni„5  Rve  e„„,ps.  The  ,.rs,  cn,ry  is  ,ho  ,ela,ipn  involve-,  in  ,he 
<es,  and  ,he  seson,,  and  ,oudh  en,„es  arc  ,hc  argumenls  o,  ,he  ,esi.  T„e  Hard 
cniry  f.,vos  ,be  pup,^  ,p„p,  calculalion  0,  Ibe  ,irs,  argumon,. 

c ly,  II.C  ,1,1,1  eniry  gives  llic  level  number  associaled  wild  llie  second 
ergumeni  ol  Ihe  asserlion.  Tliese  level  numbers  will  be  Ihe  same  as  Ibe  value 
o,  lEVFI  ,n  Ibe  ,0R  asserlio,,  |,  |„e  ^ 

constant  or  is  available  from  the  context  record. 


For  example,  such  a list  r 


n.8ht  be  (EO  COUNT  X.l  5 X).  This  test  indicates  that 


count,  as  clefimd  m the  X.l  LEVEL,  must  be  equal  to  5. 


isiunClon  and  coniunCion  is  indicaled  in  Ibe  CONO  lis,  as  follows.  A simple  lis, 

O'  >es,s  represenis  a coniuCon  o,  Ibose  ,e„s.  A lis,  in  ,„cn,  o,  sucb 

coniunclions  represenl,  , disjunCon.  Tbi,  lis,  sfruCure  bears  a close 

resemblance  ,o  ,be  disiunCive  form  lb.-,  ,be  user  mus,  use  ,0  pbrase  Ibe 

relneval  coiidil.on.  I,  A,  B.  C and  D are  lesis,  Iben  Ibe  CONO  lis,  lor  AAuIvCaD 
would  be  ((A,Q)(C,D). 


DMLP  constructs  tests  to  enforce  quantification.  If  the 
quantification,  the  system  inserts  the  test  (ALL  EQ  0). 


user  specifies  universal 
The  Program  Writer  will 


UNDERSTANDING  DA  I A STRUCTURES 
2.5  Request  Handler  Assertions. 


61 


eventually  construct  the  progtam  so  that  a variable  associated  with  ALL  (defined 
in  the  TYPE  pararteter  of  the  LINKS  assertion)  is  set  to  non--’cro  if  the 
associated  condition  is  ever  false.  This  variable,  also  called  a quantification  flag, 
signals  the  truth  value  of  the  entire  condition. 

Similarly,  specifying  existential  quanti'icalion  results  in  a test  (ANY  EQ  1).  The 
variable  associatcrl  with  ANY  is  set  to  non-zero  in  the  generated  procedure  *f 
the  associated  condition  is  ever  true. 

2.5.3  IPHEySLD  assertion. 

TOBEUSEO  describes  the  matrix  associated  with  a query  LEVEL.  The  HEMS 
parameter  is  again  a list,  each  entry  describing  a column  of  the  matrix.  The 
entry  describing  a column  is  in  turn  also  a list  consisting  of  three  entries.  The 
first  of  these  is  an  item  name  statistic  command,  or  constant.  The  second  entry 
indicates  the  query  level  where  the  calculation  of  the  entries  in  the  column  is 
defined.  The  third  entry  assigns  a variable  name  which  can  be  used  by  the 
system  for  the  calculation  of  a statistic. 

As  an  illustration.  Figure  2-10  gives  the  complete  set  of  assertions  derived  by 
the  Request  Handler  from  the  query  of  Figure  2-6.  It  is  these  assertions  along 
with  assertions  describing  the  data  base,  that  the  Program  Writer  will  use  to 
generate  the  desired  procedure. 


UNDERSTANDING  DATA  STRUCTURES 
2.5  Request  Hnncllcr  Assertions. 


62 


TOfirOPND  ({Al  A2» 

LINKS  ((MAIN)  PORT  DOCTOR  X) 

LINKS  {{REPEAT  XA)  DOCTOR  PATIENT  {X  . 1)) 

FOR  ({{{GT  PATAGE  {X  . n(21){X  . 1))))  (X  . 1 )) 

TOHEUSED  {{{DOCNAME  X Xlj  {DOCAGE  X X2)  {SPECIALTY  X X3) 
{REPEAT  {X  . 1)  XIO))  X) 

TOMEUSED  {{{PATNAME  {X  . 1)  X5)  {PATAGE  (X  . 1)  X6) 
{DIAGNOSIS  {X  . 1)  X7))  {X.l)» 

ISVAR  {XIO) 

ISVAR  {X7) 

ISVAR  {X6) 

ISVAR  {X5) 

ISVAI?  {XA) 

ISVAR  {X3) 

ISVAR  {X2) 

ISVAR  {XI) 


Figure  2-10.  Asserlions  destribing  the  query  of  Figure  2-6. 


UNDERSTANDING  DATA  STRUCTURES 


63 


2 6 A BNF  Dff^c  riplipn  of  I bp  Gener  ale(l  Protcdiife^ 

The  Frame  clc^cril.ed  m -.ecbon  2.8,  defines  U'C  set  of  programs  semanhcally  and 
pcrrr.ts  a derivation  of  the  correct  syntax  This  would  seem  to  make  a DNF 
description  superfluous.  Dowever,  knowing  something  about  the  general  syntax 
of  the  penerr'  .d  programs  will  make  the  Frame  easier  to  understand. 


BNF  is  usually  used  to  describe  a computer  language.  It  is  possible  to  generate 
a BNF  description  of  COBOL  as  augmented  by  Ihe  Data  Manipulation  Language 
and  this  l3Nr  description  would  include  all  the  programs  generated  by  the  DMLP. 
The  DNF  description  in  Figure  2-11  is  more  restricted  and  describes  only  a 
subset  of  all  programs  that  can  be  written  m COBOL.  But  it  still  encompasses 
all  programs  that  will  be  generated  by  the  DMLP. 

The  BNF  of  Figure  2-11  uses  the  convention  that  lower  case  strings  enclosed  in 
< > are  terminal  symbols.  This  convention  is  used  for  symbols  which  stand  for 
COBOL  or  DMS  names  or  literals.  These  names  or  literals  must  abide  by  the 


COBOL  and  DMS  rules. 


UNDERSTANDING  DATA  STRUCTURES 

2.6  A E3NF  Description  of  the  Generated  Procedure 


64 


<PROGRAM.^ 

••  a 

PROCEDURE  DIV'SION.  <MA1N  SECTION*  | 
<PROGRAM*  <SUB  SECTION* 

<MA1N  SECTION* 

jjrs 

PROCI  SECTION.  ^ivlAlN  PARAGRAPH*  | 
^MAIN  SECTION*  <SUB  PARAGRAPH* 

<SUB  SECTION’ 

<SLCT10N  NAME*  SECTION.  <SUB  PARAGRAPH*  | 
<SUB  SECTION*  <SUB  PARAGRAPH* 

<MAIN  PARAGRAPH'* 

^PARAGRAPH  NAME*. 

<OPEN*  <PORT*  <CL0SE-  STOP. 

<SUQ  PARAGRAPH* 

• jr* 

'LOOP  BODY*  1 < ACT  ION  PARAGRAPH*  | 
<1NTERACT1VE  PARAGRAPH* 

<OPEN> 

OPEN  AREA  <AREA  LIST*. 

<CLOSE> 

?!■ 

u.JSE  AREA  <AREA  LIST*. 

<AREA  LIST* 

<Area  name*  | <AREA  LIST*  <Area  name* 

<PORT* 

jja 

<F1ND*  <ACTION*  | <LOOP  CONTROL* 

<FIND> 

<INIT  CALCKEY*  FIND  <Port  record*  RECORD. 

<1N1T  CAICKFY* 

♦ «" 

MCrVE  '*  .S  VAIUE*  TO  <Calc  key*.  | 
DISPLAY  "<Calc  key*"  "EQ?"  . 

ACCEPT  <Calc  key*. 

<LOOP  COf'ITROL* 

FIND  FIRST  <Rccord  name*  RECORD 
OF  <LOOP  CONI  EXT*. 

MOVE  CURRENCY  STATUS  FOR  <LOOP  CONTEXT* 
TO  'Record  data  base  key*. 

PERLORM  'LOOP  PARAGRAPH  NAME* 

UNTIL  'CONTROL  TEST*. 

<LOOP  CONTEXT* 

jja 

'Set  name*  SET  | 'Area  name*  AREA 

<CONTROL  TES1> 

ERRORSTATUS  IS  NOT  EQUAL  TO  0 | 
(ERRORSTATU5  IS  NOT  EQUAL  TO  0) 
OR  {'SYSTEM  TEST*) 

Figure  2-11  continued  on  the  next  page. 


UNDERSTArjf)iNG  DA  I A 
2.6  A DNr  Ococriplion 

<LOOP  BODY.> 


<ACTiON  r’ARAGFMf’M> 
<ACTION> 

<DISPLAY  SE0UCNCt> 

<DISPLAY' 

'DISPLAY  LIST’ 
<STAT1STIC  SrQUCNCC> 


<CALCULAT10N> 


<MIN  OR  MAX> 
<SET  0 FLAG> 


S7RUCTLIRCS  65 

of  the  Generafccl  Procedure 


<LOOP  PARAGRAPH  NAME  •. 

M(3VE  CURPEMCY  STATUS  FOR  <LOOP  CONrLXT> 

TO  ' VARIABLE  NAME> 

< ACT  ION' 

MOVE  ' VAfHAfJl  E NAME > TQ  ■'Record  data  base  key>. 
FIND  'Record  natne  • USING  'Record  data  base  key>. 
FIND  NEXT  ' Record  namc>  RECORD 
OF  <LO(OP  CONI  EXT  >. 

<ACTION  f’ARAGRAPM  NAME>.  <ACTI0N> 

<DISP1  AY  SEOUENCE>  | ^STATISTIC  SEQUENCE:-  | 
<CONDIlION  TEST>  | 'SET  Q ELAG>  | 

<SET  0 El  AG  - 'CONDITION  TEST>  | 

•DISPIAY  StOUEMCE>  'SET  Q ELAG>  | 

<FINiy)WNf  R SEQUENCE^  <ACT10N> 

'DISPIAY'  ] - VALUE  DETERMINATIONS>  <DISPIAY>  | 
'DISPLAY  SEQUENCE>  'CONDITION  TEST>  \ 

'DISPI  ay  SEQUENCE^  'LOOP  CONTROL> 

DISPLAY 'DISPIAY  LIST>. 

'HAS  VALUE>  \ 'DISPLAY  LIST>  'HAS  VALUE> 


'CALCULATION'  | 

'VALUL  DETEPMINATION>  'CALCULATION-'  | 

'STATISTIC  SEQUENCE>  'CALCULATION^  | 

'STATISTIC  SEQUENCE>  'VALUE  DETERMINATION> 
'CALCULATION' 

MOVE  'COUNT  VARIABLE  NAME>  ♦! 

TO  'COUNT  VARIABLE  NAME>.  | 

MOVE  'TOTAL  VARIABLE  NAME>  ♦ 'ITEM  OR  VAR> 

TO  'TOTAL  VARIABLE  NAME>.  | 

MOVE  'MIN  OR  MAX>  OF  'STATISTIC  VARIABLE  NAME> 
AND  'ITEM  OR  VAR>  TO 
'STATISTIC  VARIABLE  NAME> 

MINIMUM  | MAXIMUM 

MOVE  'ZERO  OR  ONE>  TO  'QUANTIFICATION  FLAG>. 


Figure  2-1 1 is  continued  on  the  next  page. 


UNDERSTANDING  DATA  STRUCTURES 

2.6  A BNF  Dcicriplion  of  llie  Generated  Procedure. 


66 


<C0ND1T10N  TEST> 

<IF> 

'CONTINUATION> 

<VALUE  DE7LRM1NAT10NS> 

‘ 4 * 

<VALUE  DETERMINATION- 

•♦es 

<STATISTIC  CALC> 

<STAT  INIT- 
<GET> 

<F INDOWNER  SIQUENCE' 

»»*• 

<RUNT|ME> 

<INTERAC1IVC  PARAGRAPH- 


<IF>  i <CONDITION  VALUES>  <IF> 

IF  <TEST>  <C0NTINUATI0N> 

ELSE  PERFORM  <ACTION  PARAGRAPH  NAME>. 

NEXT  SENTENCE  | 

PERFORM  -ACTION  PARAGRAPH  NAME- 


"VALUE  determination-  | 

"■VALUE  determinations-  <VALUE  DETERMINATION- 


< STATISTIC  CALC-  | "GET-  | 

"FIND  OWNER  SEQUEfCE-  "GET-  | "RUNTIME- 

"STAT  INIT-  "I  OOP  CONTROL-  | 

"STAT  INIT-  <STAT  INIT-  "LOOP  CONTROL- 
MOVE  "TOTAL  VARIADIE  NAME- 

DIVIDED  DY  "COUNT  VARIABLE  NAME- 
TO  "AVI  VARIABLE  NAME-.' 

MOVE  0 TO  "STATISTIC  VARIABLE  NAME-. 


GET  "Record  n?me-  RECORD. 


FIND  OWNER  RECORD  OF  "Sel  name-  SET.  | 

"FINIXlWMf  I?  StOUENCE- 

FIND  OWNER  RE  CORE  OF  "Set  name-  SET. 


IF  "Item  name-  IS  WT  EQUAL 

TO  HIGH  VALUES  NEXT  SENTENCE 

ELSE  PERFORM  "INTERACTIVE  PARAGRAPH  NAME- 


"INTERACTIVE  PARAGRAPH  NAME-. 
DISPLAY  ""Item  name-"  ""REL-?". 
ACCEPT  "Hern  name-. 


Figure  2-1 1 is  conimued  on  the  next  page. 


UNDEI^STANDING  DATA  STRUCTURES 

2.6  A UNr  Ocrcripfion  of  the  Generated  Procedure. 


67 


<CONDITION  VALLIFS>  ::= 

<CONOITION  VALUE>  1 

<C0ND1T10N  VALUE>  <CONDITION  VALUE> 

<CONDIT  ION  VAl.UL> 

<FIND>  1 '•OUANTinCATIGN>  | 
<VALUE  DETERMINATION> 

<OUANTIFICATION> 

MOVE  0 TO  -QUANTIFICATION  FLAG>. 
<LOOP  C0NTR0L> 

<HAS  VALUE> 

<ITEM,  VAR  OR  NUM>  1 <Non-numeric  literal- 

<ITEM,  VAR  Oft  NUM  ' ;:■= 

<ITEM  OR  VAR>  | -Numeric  literal> 

<ITEM  OR  VAR>  ::•« 

-item  name>  | -VARIABLE  NAME> 

<TEST> 

ERRORSTATUS  IS  NOT  EQUAL  TO  0 1 
<USER'S  NLGATED  TEST>  1 
<SYSTEM  TEST> 

<U5ER’S  NEGATED  TEST>  < 

HAS  VALUE>  -RELATION>  <HAS  VALUE> 

<SYSTEM  TEST> 

<STATISTIC  VARIABLE  NAME>  <RELATION> 
<ITFM,  VAR  OR  NUM>  | 
<QUANTIFICATION  ELAG>  <RELATION> 
<ZERO  OR  0NE> 

<RELATION> 

IS  EQUAL  TO  1 IS  NOT  EQUAL  TO  | 

IS  LESS  THAN  | IS  NOT  LESS  THAN  1 
IS  GREATER  THAN  1 IS  NOT  GREATER  THAN 

<REL> 

EQ  1 NE  1 GT  1 LT  1 GE  1 LE 

<ZERO  OR  ONE-' 

0 1 1 

<SECTION  NAME> 

PROC<lntcger> 

<PARAGRA[’M  NAME> 

PARA-<lntcgcr> 

<LOOP  PARAGRAPH  NAME> 

■ ^PARAGRAPH  NAME> 

<ACTION  PARAGRAPH  NAME> 

<PARAGRAPH  NAME> 

<INTERACTIVE  PARAGRAPH  NAME>  <PARAGRAPH  NAME> 


Figure  2-11  lo  continued  on  the  next  page. 


UNDERSTANDING  DATA  STRUCTURES 

2.6  A BNK  Dcocnplion  of  flie  Gcneraleci  Procedure. 


68 


<VAR1AHLE  NAME>  X<linteger>  | Y<lnfeger>  | Z<lnfeger> 

<OUAIJTinCAT10N  FI  AG'  -VARIABLE  NAMf  > 

<COUNT  VARIABLE  NAME>  <VARIABlE  NAME> 

<TOTAL  VAIRABLE  NAME>  ^VAIRABLE  NAME> 

<MIN  VARIABLE  NAME>  'VARIABLE  NAME 

<MAX  VARIABLE  fJAME>  <VARIADLE  NAME> 

<AVE  VARIABLE  NAML>  VARIABLE  fJAME> 

<STATIST1C  VARIABLE  NAME> 

'COUNT  VARIABLE  NAME>  | 

<T0TAL  VARIABLE  NAME>  | 

<MIN  VARIABl  E NAME>  | 

<MAX  VARIABLE  NAME>  | 

<AVE  VARIABLE  NAME> 


Figure  2-11.  A BNF  cleocnplion  of  fhe  generated  procedure. 


The  first  three  produclions  of  Figure  2-11  are  applicable  to  all  C0E30L  programs. 
These  productions  describe  a ^PROGRAM'*  as  consisting  of  the  statement 
"PROCEDURE  DIVISION."  followed  by  one  or  more  sections.  A section,  in  turn, 
begins  with  a statement  naming  the  section,  followed  by  one  or  more 
paragraphs.  These  three  productions  are  perhaps  somewhat  different  from 
their  equivalent  in  a COBOL  BNF  description  because  they  differentiate  between 
the  first  section  and  all  other  sections  and  also  between  the  first  paragraph  and 
all  other  paragraphs. 

Th?  need  for  this  latter  distinction  is  thal  the  <MAIN  PARAGRAPH>  in  the 
generated  programs  performs  four  special  functions.  Prior  to  execution  of  the 


69 


UNDERSTANDIMH  [)ATA  STRUCTUFiES 

2,6  A BNF  Dcicriplion  o'  the  Generalcd  Pio'cdurc. 


rest  of  llic  proeram  it  opens  all  necessary  areas  and  enters  the  data  b 

,h.  .nd  Of  o«<of,on  .1  clo«o  ^11  .rcoo  opened  »l  Ihe  bes.nnmg  -nd  olopo 


execution. 


All  paragraphs  other  than  the  ''MAIN 


f>ARAGRAPH'  are  one  of  three  types.  The 


first  type  is  a paragraph  that  can  be  repetitively  performed.  The  other  two  ar« 
generated  for  conditio  lal  execution. 


Entry  info  Iho  d.rto  boco  it.  done  w,tb  n data  base  port  A port  can  be 
access  recorct  using  a calculated  key  value,  a special  type  ol  set,  or  a sea.,1.  of 
an  entire  area  containing  liar  port  record  Wbe  a calculated  key  is  used,  ttien 
the  ek'IND>  and  'IMlt  CALCKCVa  productions  arc  used.  Hole  the  possibility  ot 
generating  an  mleractive  program  in  the  sItttT  CALCKEYr  production. 


It  a calcutateci  Key  cannot  be  used,  the  program 


will  be  constructed  to  contain 


an  initial  locp  wtiich  either  searches  an  area  or  a special  type  ol  set,  known  as  a 
system  set.  In  either  case,  the  constructs  used  arc  described  by  the  <LOOP 
CONTROL*  and  sLOOP  i»DY>  productions.  Note  hero  that  the  <100P  CONTEXT* 
production  close, It, os  Iho  d, Heron, ation  between  loops  searching  a set  vs  those 

searching  an  area. 


The  SLOOP  CONTIOTL*  and  <LOOP  BODY*  productions  are  two  ol  Iho  most 
imporlao,  in  the  BNT  description,  <LOOP  BODY*  describes  the  construction  o, 


UNDERlTANDING  DATA  SIRUCTURES 

2.6  A Fi^gr  Deocripfion  o(  \Uo  Generated  Procedure. 


70 


Ihe  „ pc„„,„ed  Ihe  p,os„„  ,ccho„  described  wdh  -LOOP 

CONTPOlr.  For  every  <LOOP  CONrPOl>  there  will  be  , p„„pe  <LOOP  nODV>. 

This  reslr„ii„n  cenep,  be  ind.Ced  wdh  the  Bttr  becepse  COBOL  uses  labels  lo 
coniro,  ereculioe  ^ 

Aleol-ilKo  Mne,„a,e  ,us,„,  pesbhe  ol  proy.ra,  blocKs  lo  control  program  How, 

M-en  Ihe  connerbon  between  <LOOP  C0NTP0L>  and  dOOP  B0DV>  could  be  more 

s.ron^ly  described  II*  ONF.  , have  d,st,„e,„.shed  between  LOOP.  ACTIOM,  and 

interactive  paracraph  names  In  tlw  BNF  lo  |„d,ca|ed  these  program  llo„ 

connect, ons,  Th,s  dlllerence  and  others  between  Algo,  and  COBOL  Is  the 
subject  of  section  2.10. 

<LOOP  CONTROLS  and  <100P  BODY,  control  Ihe  code  generation  lor  every 
traversal  tro,„  a sect, on  of  code  tor  a puery  level  lo  a sect, on  ot  code  tor  a 

lower  level.  The  <L,X,P  C0NTRa>  and  <LOOP  BODYs  produCons  correspond  to 
the  code  BPnoratod  for  a LtNKS  assertion. 


The  .ACTIONS  „,odc,cl,on  descr.bes  the  pr.ucedure  tor  pertorming  Ihe  ,cl,vlt,es 
query  level.  These  activities  include  matrix  display,  statistical 
c»lcutal,on.  ouantiheatcon.  and  other  cond,l,on  test,ng.  The  action  associated 

with  quanl,l,cal,on  ,s  the  setting  ot  the  quanl,licalion  Hag  which  determines  Ihe 
truth  of  the  quantification. 


Another  important  production  is  Ihe  .ACTIONS  production. 


<ACTI0N>  can  occur 


UNDER51  Af-JOIMC.  DATA  STRUCT bRLS 

2.6  A IJNf  Dooctiplion  of  Ihc  Generated  Procedure. 


71 


in  a <LOOP  UODY>.  follow, ng  a 41, m <PORT>.  or  within  an  <ACTI0N 
PARAGRAPH'.  Tlie  code  generated  by  <LOOP  1300Y>  and  <FIND>  inoure^  that 
certain  record,  are  available  so  that  the  activity  can  be  performed,  Similarly, 
the  code  p.enerated  by  an  <IF>  production  wilt  test  to  see  if  the  activity  should 

be  performed. 

Since  the  <1F>  conHruct  is  part  of  the  <C0ND1T!0N  TEST>  production  which  in 
turn  can  be  an  <AC1I0N>,  it  is  possible  for  code  to  be  constructed  which 
consisS  of  an  IF  statement  referencing  a paragraph  .ontaining  another  IF, 
referencing  a second  paragraph  containing  a third  IF,  and  so  on.  This  is  exactly 
the  program  censiruclion  that  results  for  every  conjunction  of  tests  in  a query 
condition.  Section  2.9  explains  the  reason  for  the  use  of  this  programming  style 
rather  than  one  which  simply  reproduces  the  entire  condition  in  a single  IF 

statement. 

The  recursive  nature  of  the  program  consiruction  process  becomes  obvious 
when  we  observe  that  any  <ACT10N>  can  involve  one  or  more  <VALUE 
DETERMINATION>.  <VALl)E  DETERMINAT10N>  can  bo  done  with  a .STATISTIC 
CALO  which  will  in  turn  evoKe  a new  <L00‘’  CONTROL>.  The  general  purpose 
of  the  .VALUE  DETERWINATION-  and  .CONDITION  VALUE>  constructs  is  to 
generate  code  tliat  will  determine  the  values  of  the  arguments  needed  by  the 
function  invoked  by  'ACTION^.  <VALUE  DETERMINATION>  or  .CONDITION 
VALUE>  precede  the  actual  display,  calculation,  or  test  processes. 


UNDERSTANDING  DATA  STRXTURES 

2.6  A BNF  Description  of  Ihe  Generated  Procedure. 


72 


PROCl  SECTION. 


IF  SAI.ARY  is  not  less  THAN  6000 
PERFORM  PARA-201 
ELSE  PERFORM  .. 


PROC2  SECTION. 

PARA-201. 

IF  SALARY  IS  NOT  LESS  THAN  10000  NEXT  SENTENCE 
ELSE  PEPf  ORM  PARA -202. 


PARA  202 

MOVE  0 TO  XI. 

FIND  FIRST  DEPENDENT  RECORD  OF  DEP5ET  SET. 

MOVE  CLIRRENCY  STATUS  FOR  DEPSET  SET  TO  DEPKEY. 

PERf  ORM  PARA-203  UNTIL  (ERRORSTATUS  IS  NOT  EQUAL  TO  0) 
OR  (XI  IS  EQUAL  TO  1). 

IF  XI  IS  NOT  EQUAL  TO  I NEXT  SENTENCE 
ELSE  PERFORM  PARA-20A. 

PARA-20A. 

prnpram  conliniialion... 


PARA  203. 

MOVE  CURREfCY  STATUS  FOR  DEPSET  TO  Yl. 
GET  DEPENDENT. 

IF  AGE  IS  NOT  I ESS  THAN  21  NEXT  SENTENCE 
EISE  PERFORM  PARA-205. 
move:  Yl  TO  DEPKEY. 

FIND  DEPENDENT  USING  DEPKEY, 

FIND  NEXT  DEPENDENT  RECORD  OF  DEPSET  SET. 


PARA-205. 

MOVE  1 TO  XI. 


Figure  2-12.  Code  gt»neraled  for  Ihe  condition  of  Figure  2-5. 


Figure  2-12  illuslralos  Ihe  procedure  that  is  generated  for  Ihe  lest,  "salary  less 


understanding  data  structures 

2.6  A ONE  Dcocription  of  the  Gooeraloc*  Procedure. 


73 


than  SC, 000  or  ralary  to'>^i  than  110,000  and  any  dependent  under  age  21  " (The 
HI-IQ  uquivairni  lo  dluolralrd  in  Tignre  2-5. > 

PROCl  in  Eigure  2-12  tests  "salary  loss  than  S6000."  If  th.s  is  true  then  the 
appropriate  procedure  is  nxccutcd.  If  not  true,  then  PARA-201  is  performed 
PARA-201  checks  Iti'’  first  test  in  the  conjunction  "salary  less  than  SI0,00C  and 
the  oxislcncc  of  any  dependent  cfnii"  This  section  of  code  is  therefore 
executed  only  if  ra'ary  is  not  less  than  86000.  m otlier  words,  only  if  the  first 
disjunct  of  llie  coe.dihon  fails.  Code  for  secon-fary  diejuncts  is  always  'ocated  in 
a separate  'eel ion  of  the  Procedure  Division. 

PARA-201  IS  a simple  <CONDITION  TEST>  containing  an  <ir>  with  no  preceding 
value  determinations.  S^nce  this  paragraph  will  he  executed  within  the  scope  of 
a PERfORM,  NEXT  SENIE^CE  will  not.  cause  execution  ol  the  succeeding 
paragraph. 

PARA-202  IS  also  a 'CONDITION  TEST>,  but  this  lime  the  <IF>  is  preceded  by  a 
<C0NDIT ION  VALUES"'  construct  In  this  case  the  sCONDlTlON  VALUE>  is  a 
quantification  so  the  first  statement  of  the  paragraph  is  a KOVE  to  initialise  XI, 
the  quantification  flag.  The  remainder  of  the  paragraph,  preceding  the  IF 
statemeni,  is  a *'lOOP  C0NTR0L>.  The  <C0NTR0L  TEST>  for  the  loop  contains  a 


<SYSTEM  TEST>. 


UNDERSTAmiNn  DATA  STRUCT  URLS 
2 6 A [?Nr  Dc'cripitoii  of  fhc  Gcnp'-itfcd  Proccdurr 


7A 


PARA -203  IS  a ^L0C)R  nOD>>  and  tht  'ACTION^  it  contains  is  another 
*:C0NDTI0N  rt  ST>.  Y|  1$,  used  to  store  the  data  base  key  and  a direct  FIND  is 
done  wifh  that  data  base  key  The  FIND  using  the  data  base  key  is  done  to 
rtsovei  the  proper  dala  base  context  in  case  intervening  processing  has  also 
acciassed  the  CCPSET  scf  If  turns  out  that  Inis  precaution  is  not  necessary  (or 
thi'j  particular  program,  but  making  ir,clu*;ion  of  the  precaution  dependent  on 
necessity  would  involve  considerable  backtracking  during  the  program 
gcrcration  process. 

PARA-205  IS  another  'ACTION  PARAGRAPH’  consisting  of  a single  Q 

FLAG^.  Upon  entry  to  PARA-204  the  entiie  condition  is  true,  and  PARA-204 
continues  with  the  processing  specified  al  that  level. 


This  terminates  ihe  example  Some  additional  comments  regarding  the  FJNF  of 
Figure  2-1  1 are  in  order: 

a)  The  principle  of  downward  attribute  migration  is 
reflecled  in  the  <FINDOWNrR  SEQUENCE^  as  used  in  <VALUE 
DETERMINATION' 

b)  The  <RUNT1MC>  AND  <1NTERACTIVE  PARAGRAPH> 
prodiK  lions  generate  code  for  the  interactive  determination 
of  pr.rameler  values.  Nofe  that  for  "ordinary"  interactive 
vain-.*  determination,  the  interaction  occurs  only  once  during 
prr.gram  execufion  (because  after  being  executed  once,  the 
P’lrameter  will  not  have  HIGH-VALUES).  Interaction  to 
determine  the  value  of  a calculated  key  as  indicated  in  <INIT 
CAICKF.Y>,  will  occur  every  time  another  direc*  access  is 
desired. 

c)  The  tatter  portion  of  the  right  hand  side  of  <STATISTIC 
CALC>  reflects  the  fact  that  an  average  is  actually  the 
quotient  of  two  other  statistics. 


UNDERSTANDirjG  DATA  STRUCTURES 

2.6  A urgr  Df-.cnpt.on  of  fhp  Generafcd  Procedure, 


75 


cl)  The  U.C  of  SYSTEM  TEST>  in  ^CONTROL  TEST>  helps  fo 
rrdurr  r.eodioo  time  of  fhe  ernor^foef  procedure  by 
le.mm.ilmp  loop  execution  if  the  loop  was  invoked  fo 
delr^i  roino  a value  for  a conddional  fest,  and  if  the  outcome  of 
ttie  Ic'l  can  be  d^derrmned  from  fhc  present  value  of  the 
arguments.  This  pre-leslii-g  is  applicable  for  certain 
slairtics  who'-e  value  changes  monotonically  during  loop 
iterahons.  See  also  section  2 9. 


e)  I hr  ' TISPLAV  SEC*UrTJCr>  production  can  invoke  <LOOP 
COfJIROL-  directly  in  case  the  display  of  a sub-matrix  was 
in  heated  in  the  cjucry. 


f)  (RRORSTATUS  is  a Data  Management  System  (DMS) 
variable  which  is  set  non  zero  by  the  DMS  when  an  error 
occurs  or  If  all  records  m a set  or  area  have  been  processed 
Ibe  programs  rrnerated  by  the  system  would  be  better  if 
ll'oy  weie  able  to  distinguish  between  expected  and 
unoxpeficd  er-or  conditions  by  checking  the  actual  value 
pla.-ed  in  [RRORSTATUS  by  tbo  DMS 


g)  The  second  alternative  in  the  <CONTtNUATION> 

production  (invoked  by  ♦'|F>)  is  used  if  tlie  <TFST>  tested  in 
»l>e  If  statement  is  not  part  of  th«  last  disjunct  in  a 
dispiiKtive  condition  The  ^ACTION  PARAGRAPH  NAME> 
P^ragrapti  w.lt  contain  the  code  to  test  the  other  disaincts  of 
tlie  (ondition  Such  a paragraph  will  always  be  the  first 
P^uagraph  in  a sub- section. 


b)  Two  adddional  constructs  for  determining  a value  are 
provided  if  the  value  is  to  be  used  in  a <CONDITION  TEST> 
c p.  .IS  a -COrJUITION  VALUE>.  A direct  access  is  allowed 
Uo  force  the  condition  to  be  true)  if  the  test  contains  an 
equality  relation  involving  a calculated  key. 
<QUArjTiriCATION>  will  be  invoked  if  the  test  involved  was 
coivlructod  by  the  Request  Handler  for  checking 
qnanlifir.alion.  ^QUANTIFICATION-  results  in  the  generation 
0 (Ode  which  determines  tlie  value  of  the  quantification  flag 
The  code  so  generated  is  similar  to  the  code  generated  for 
the  calculation  of  a statistic. 


i)  'SET  Q FLAG-  Cno  bo  used  s.^vcral  ways  in  an  <ACTION>. 
If  <GET  0 FLAG-  by  itself  comprises  the  action,  then  it  will  be 
the  only  statement  in  an  action  paragraph  which  is 
conditionally  executed.  <SET  Q F|.AG>  will  precede  a 
"CONDITION  TEST>  only  if  that  fest  is  universally  qualified. 


li 


i 4 

I 

i 

I 

i 


UNDERSTANDING  DATA  STRICTURES 

2.6  A UNF  Description  of  fhe  Generated  Procerture. 


The  flap,  is  set  fisise  preceding  the  test  so  that  it  indicates 
false  If  the  test  fails.  If  the  test  succeeds,  the  tlag  is  reset 
to  true.  ■'SET  Q FLAG>  following  a •'-DISPLAY  SE0UEf'CE> 
occurs  only  for  displays  in  the  scope  of  a ONE  command. 


UNDERSTANDING  DATA  STRUCTURES 


77 


2.7 

A Frame  corv>i.lo  of  a eel  of  lop.cal  sfafements  or  rule 


These  rules  are  of 


four  different  types  a 


s discussed  in  Section  2.2.2  and  reviewed  here: 


51  Primitive  procedure  rule. 

52  Ilcralive  rule. 

53  Definition  rule. 

SA  Axiom. 


Rules  «nd  the  current  stale  are  expres 

semantic  interpretation.  For  cxamptc,  the 


jssed  with  assertions.  Each  assertion  must 

correspond  to  a template  with  a 

CONTA1NS(PATIENT.PATNOI  IhM  lA-  PATIENT  reced  .o.dain. 

the  PATNO  item. 


Evatuation  of 


an  assertion  determines  if  it  is  true  or 


fatse  in  one  of  several 


ways; 

(a)  if  previously  staled  to  be  true  or  false  (i.e.  true  or 
false  in  the  current  state); 

(h)  by  evahiatinp  a rule  which  has  the  assertion  in  a 
post-conclilion 

(c)  by  evaluating  a LISP  function 

Assertions  may  contain  variables  which  can  become  bound  when  the  insertion  is 
evaluated,  if  CONrAlNS(PATlENT,PATNO)  is  true  in  the  current  state,  then 
evaluation  of  CONTAlNS(recx.PATNO)  will  bind  recx  (a  variable)  to  PATIENT. 


DMLP  uses  58  different  types  of  assertions. 


Six  of  thesy}  were  described  in 


UNDL(?ST/\Mi)ir:r.  data  structures 

2.7  As«,t*f|ion, 


78 


Figure  2-8.  Fi{',urr  2 13  descnbei  Iho  five  used  Ic  define  fhe  dalahaoe 

otrucliirc.  An  paper  fGernl'.nn  197^4]  illuotrafcj  the  ea?;p  cf  conversion 

from  a Data  Definition  Language  specification  of  a data  base  to  a set  of 
assertion' 


INAf^CAfRLCORD.AREA) 

RECORD  IS  contained  in  AREA. 

CONTAIMS(RECOI7D,ITEM) 

ITEM  is  IS  contained  in  RECORD. 

DDKEYfRtCORD.ITEM) 

ITEM  IS  a data  base  key  for  RECORD. 

CALCKEY(RECORD.ITEM) 

ITEM  is  a direct  access  attribute  (calculated  Key)  of  RECORD. 

HIERARCI  tY(',ROUP(RECORDI  ,RECORD2,SET) 

RECC'RDI  is  the  owner  of  SET,  and  REC0RD2  is  a member  of 
SET. 


Figure  2-1 J.  Assertions  i.r-ed  to  describe  the  data  base. 


2.7.1  Assertions  for  the  SI  rutos. 

Figure  2-14  contains  the  assertions  which  describe  the  results  of  single  program 
steps.  These  assertions  occur  as  post-conditions  of  rules  of  type  SI.  Those 
parameters  which  are  underlined  in  Figure  2-14  have  a uniqueness  property. 
For  example,  the  system  will  Insure  that  a particular  ITEM  will  contain  only  one 
VALUE:  If  the  assertion  C(X„,0)  has  been  made  followed  by  a later  assertion 


UNDERSIAtJOINd  DATA  STRUCTURES 
2.7  Aoocrhnn>;. 


C(X1,1),  then  Iho  oyr.tcm  will  erase  the  first  assertion. 

OPFMf  D(ARI  AS) 

Al?l  AS  is  a list  of  areas  that  have  been  opened. 

CLOSl  IXARt  AS) 

AREAS  IS  a list  of  areas  that  have  been  riosed. 

STORE’!  OftlAML) 

the  proy,ram  fJAML  has  been  stopped 

ACCE  R I ( V ARIARLE ,n  E M.RFL  AT lOM) 

VAItlADI  E contains  the  value  entered  by  the  user  in  response 
to  the  pfompl  ”iIFM  R{  LATUiN’" 

CURRmrtK't  CnRD.IEVm 

E<i  CORD  IS  current  at  IFVFL  e p..  The  named  rerord  has 
becii  found  within  the  program  segment  associated  with  the 
level  identifier 

INCORttRECORD.lLyri) 

The  named  RECORD  is  in  core  and  available  for  processing  to 
tfie  program  segrncnt  assoriated  with  LEVEL. 

CUTEM.VALLff ) 

The  named  ITEM  contains  the  given  VALUE 
ANYOUTRUKITEMS.LEVEL) 

ITEMS  is  a list  of  the  columns  of  a mafri*  (that  has  been 
output  at  the  given  level). 

FOUNUOWUI  R(RECORD  1 .RECORD?, SE T.LFVFI.) 

The  named  RECORDl  has  been  ound  via  the  SET  using 
RFCORD?,  a member  of  that  SET  and  both  records  are  now 
current  at  tlie  LEVEL  specified. 

FOUNDNFXT(TYPE,RECORD.UNlT.LJEyEi> 

The  next  RECORD  of  the  specified  UNIT  which  is  either  an 
Area  or  Set  as  specified  in  T Cf*E  has  been  found  and  is 
current  for  the  LEVEL  spiecTied 


Figure  2- Id  is  rontinued  on  the  next  page. 


UNDERST/.flOING  DATA  STRUCTURES 
2.7  A'jt.frlioi>'.. 


80 


r0urj()((^l  COITD.ITT  M.VAt  (ir.lf  VEL) 

Tlir  n.miocl  RECOfTD  h.i';  hoco  found  uiinp.  ITEM  as  a 
calculated  Key  with  ttie  p.iven  VALUE  suet,  that  RECORD  is 
cum  nt  lor  tlic  'pecified  I EVlL 

FOUMDUSiriG/RECORD.KEY.LLYElJ 

Tlic  nai.ied  (TECORI)  tta-:.  hc'en  found  using  the  data  base  KEY 
and  IS  fiirrenl  for  the  'pecdied  LEVEL. 

FOUrjDE  TTS I ( T m .RfCORD.Uf Jl  T.ljim ) 

The  -laierd  Rf  CORD  has  been  found  as  Itie  first  record  of  the 
'■pccifirrf  UNIT  winch  is  cither  a Set  or  Area  as  specified  by 
the  value  of  TYPE  The  record  is  current  for  the  specified 
LEVEL. 


Figure  2- 1 A.  Assertions  which  indicate  the  results  oi  single 
program  statements. 


There  is  a '•rmantic  redundancy  in  each  of  the  FO'JND...  assertions  and  the 
CURRENT  a .'  ertion  since  a found  record  must  be  current.  It  would  be  possible 
to  eliminate  this  redundancy  with  a rewriting  of  the  rules,  and  such  s rewrite 
would  lead  to  a more  concise  system.  Or  the  other  hand,  the  redundancy 
serves  to  guard  against  improper  program  generation  if  (for  example)  a 
particular  rule  has  an  incomplete  pre-condilio  i. 

The  assertions  in  Figure  2-I4  appear  to  describe  the  status  of  an  executing 
program.  The  descriptions  are  more  properly  interpreted  for  program 
generation  if  each  is  read  as  if  preceded  with  the  phrase  "Code  has  been 
generated  such  that..,". 


U^4^ERSTAN[)lrg^,  data  SIRUCTURLS 
2.7  A'^scrlion'.. 


81 


ISlTtM<IUM) 

lUM  1^.  coniflinrd  within  oome  retord  »•  an  attribute  or  data 
ba'.c  kr/. 


0CA«(CC>MMArjD) 

COMMANO  Im'.  Hip  value  "COUNT"'  or  "AVE" 


BIMMAtd(  OMMAND) 

CC'MMAND  hao  the  value  "TOT"i  "MIN  , MAX  or  AVE  . 


"(A.m 

A is  riV'al  to  B 


E0*<A.U) 

A r,  cunal  to  (t  This  asseOton  differs  from  the  proceeding 
one  in  that  its  value  can  he  uncertain  if,  for  example,  either  A 
or  B are  pi  OKi  am  variables. 


TEST(CnNmri()N,LFVEL) 

The  COflDlTK'N  is  true  for  the  program  segment  defined  for 
LEVEL  CONDITION  is  a list  consisting  o.f  a rt  .dion  and  two 
arguments.  When  the  arguments  are  program  variables, 
TEST  will  have  an  uncertain  value. 


Figure  2- lb.  Other  assertions. 


2.7.2  MisccHat^er)US  assertions. 

Figure  2- lb  contains  a set  of  assertions  which  are  difficult  fo  classify.  The  first 
is  directly  derivable  from  the  assertions  describing  the  data  base  structure. 
The  next  three  are  used  to  test  the  values  of  their  arguments,  and  the  last  two 
ere  used  to  lest  values  or  insert  code  to  test  values. 

2.7.3  Assertions  for  Hie  §3  rules, 

Figure  2-16  contains  the  remaining  assertions.  These  can  best  be  classified  as 


UNDCRSTA-JQING  DATA  STRUCTURES 
2.7  Ar.'.erliiii'ic. 


82 


dcscriljifip,  tlio  pcncral  5l;»tur.  of  flic  program  or  of  particular  program  blocKc  as 
they  arn  ton'.triK  |r>d.  Tins  is  different  than  the  assei  hpns  in  figure  2 1 ^ which 
descriljc  thr  rp'iilt  of  single  program  ctalcmenls.  All  of  the  assertions  of 
Figure  2-16  are  post  tondilions  of  type  S3  rules.  S3  rules  serve  to  insure  the 
proper  compcisition  of  individual  operators  or  other  program  blocks  into  larger 
program  blocks 


UNDr^ST AHDIMG  (3ATA  r'lRUCTURLS 
2.7  Ao^-nrlioti',. 


83 


ALLFORf  UUrJGT(ACT|ON,COrjDITION,LEVtL) 

Tim  rnlirr  ()ro(^r.iin  ocp,mcnl  for  the  p.r'cn  LEVEL  hai  been 
COM  true  led.  ACriON  indie alco  the  command  which 
r'.tahli'.lted  the  niirpor.e  of  the  LEVEL  and  CONDITION 
contain',  the  COND  from  the  appropriate  f OR  arsertion  which 
muol  lie  true  prior  to  the  proce'.'.inp.  for  ACTION 

ALLFORi  AO  I lOrLCONOn  lON.LEVf  L, DUMMY) 

C;OrjnilION  contain',  one  of  the  dio|unct'>.  il<,elf  a conjunction 
of  Ir'.t'.,  from  tlm  oriRinal  CONI)  lic.l  Thio  a"ertion  stales 
that  the  program  sepmeni  to  test  the  conjunction  has  been 
r.crv'-nlecf.  DUMMY  is  a Hudp.e  variable  which  is  kept 
unhouncf  to  force  the  proper  ronstruction  of  disjunctive  tests. 

GETPORIdd  CORD) 

Ttie  inmerl  record  has  been  used  to  establish  entry  info  the 
data  bare. 

PROGRAMfNAMD 

The  named  piop.ram  has  been  written  This  is  the  goal  posed 
by  the  Reqiie- 1 Handler  for  the  Program  Writer. 

DOACTION(ACIION,LEVEL) 

Code  tor  proces'.ing  the  named  ACTION  associated  with  the 
indicated  I EVEL  lias  been  gerieralecf. 

LINKE  Of  ACT  ION, Rf  CORD  1 .RECORD?, LEVEL.ARG  I ,ARG2 ) 

Code  for  linking  from  RECORDi  to  RECORD?  has  been 
generated,  as  well  as  all  associaled  processing  for  the 
prescribed  ACTION.  This  program  segment  is  identified  by 
LEVEL.  ARGI  and  ARG2  are  used  lo  bind  the  values  of 
arguments  that  can  be  used  for  early  termination  of  loops. 
AR(il  and  ARG2  arc  therefore  associr.ied  with  ACTION. 

UPL  INKIDfRECORD  1 , RECORD?, LEVEL ) 

Rf.COhD?  h.as  been  made  current  at  the  given  LEVEL  using 
only  f IND  OWNER  operators  to  do  so,  starling  from  RECORD!. 


Figure  2-16  is  continued  on  the  next 


page. 


undef^stanoing  data  structures  84 

2J  A^ocrlrons. 

DET  VAL(  n E M,l  E VEL . Af?('.  I ,A(TG2,REL  AT  lOf  J) 

’’^hc  v.tliu.'  fof  tl'*’  ITEM  liai  been  dotorminf'd  wilfi  the 

proi’r.itn  '.rptncnf  Klcnlificd  by  LEVEL,  llic  viiluc  hao  been 
olotod  iti  AFTCil  AFT(i2  and  RELATION  arc  uccd  to  indicate 
the  Inol  in  wliidi  I lip  ITEM  p.  ir,ed,  if  it  is  used  in  a lest  The 
lallnr  two  p.vamrlcrs  arc  used  because  value  determination 
1'.  somriimes  drprndrnt  on  tlic  anticipated  use  of  the  item, 

GETinir  I i (VAFM  AUl.E.I  T CM.RELAT  ION.) 

VARIAftlE  contains  tlic  value  entered  by  llie  user  in  response 
to  till?  prompt  "IfEM  RELATION?"  This  assertion  is  the 
e<4uivalenl  of  ACCEPT  in  Figure  2-14.  This  duplication  is 
another  kludge  to  effect  the  proper  insertion  of  conditional 
tests 


Figure  2-16.  Assertions  used  to  describe  program  blocks. 


2.7.4  1 ISP  functions. 

Some  of  the  assertions  used  in  the  preconditions  of  rules  are  not  used  in  post 
condil'ons  of  other  rules  or  in  the  current  stale.  Such  assertions  may  be 
evaluated  by  LIST’  functions.  Assertions  for  which  there  exists  a LISP  function 
have  ««  as  Ilie  last  two  characters  in  their  names.  These  assertions  and  their 


meanings  are  listed  in  Figure  2-17. 


UNDERSIANDir^l  DATA  STRUCTURES 
2.7  A'.'vOt  lion>;. 


85 


DETAU  VAI  ««{ I IMS, ACT  ION) 

Coclr  brrn  pcr>rr.ilcd  to  dclcrmino  llie  values,  relative  to 
ACTir^N,  ol  til  ilf  itii  in  tl'c  ITMS  liil 

NrXTllVOlJl««{|TM5) 

Cofl<*  I a'.  Ii*'cn  f.rnr'ratr'd  to  pfotcr.',  all  iultmalrices  indicated 
by  Rtl’EAT  or  ()NF  commands  in  Die  ITMS  list. 

IfJIlVARSii«(VAR|  1ST) 

Codr  Ims  Itrrn  pcnoratcd  io  initialise  all  variables  on  the 
VARI  1ST  list  Io  sero. 

DlVVARSHii(DlVlSOR.ITfMS) 

Code  has  hren  prneratcd  to  divide  every  variable  in  the 
ITEMS  li'l  by  tlie  DIVISOR,  e g to  calculate  averages. 

STATiiiitCOMMAND) 

COMMAND  (<j  one  of  the  statistical  commands. 

RE  TO  nil  {COMM  A NO) 

COMMAND  I',  a retrieval  quanliticr  (ANY  or  ALL). 
REPQim(COMMAND) 

COMMAND  IS  a reporting  quanlifier  (MAIN, REPEAT  or  ONE). 
lITFRAI  imUTf  M) 

HEM  IS  a number  or  non-numeric  literal  (enclosed  in  single 
qiiolcs) 

BINDTYf’mi(RECORD) 

The  variable  TYP  has  been  bound  to  the  value  "SET"  unless 
RECORD  has  value  "AREA”  in  wLiich  case  TYP  is  also  bound  to 
AREA. 

BINDITMinilITEMS) 

The  variable  ITML  has  been  bound  to  a list  of  printable  items 
extracted  from  the  ITEMS  list.  This  is  done  to  eliminate 
commands  which  cruse  the  printing  of  sub-matrices  and  also 
to  replace  statistic  commands  with  the  variable  containing  the 
value  of  the  statistic. 


Figure  2-17  is  continued  on  the  next  page. 


UNDEF^STAriDirr,  DATA  STRICTUhES 
2.7  Aoi.crlioiv. 


P-f} 


BiNDVAi  ii«(Lir<i.irrM) 

'■|ip  vafiril)tn  VAI  i<i  bound  to  the  second  argument  of  an 
pqualit/  b'‘^l  contained  in  ‘bp  LIST  whicb  hao  ITEM  as  its  first 
afgument;  p g.  (IK  M EQ  VALV 

READLiiMIlM  n’fl) 

Tliir.  nsM'ilion  trtuins  Ibp  value  "ITM  REL''",  which  is  used  in 
the  gcnc'iatccl  program  to  prompt  llip  user. 

CON  1 EST  M M(  ACT  ION,  Af?G  I .ARG2) 

Ttiis  a'.'.prtion  rrlurno  a test  winch  will  bt  used  for 
Icrmtnalmg  a loop.  Appropriate  lesis  for  early  termination 
of  I he  loop  clppcnding  on  tlie  ACTION  of  the  loop  are  also 
generated  An  early  termination  lest  will  involve  ARGl  and 
ARG2, 

UNCERTfRRSTATnnO 

This  assertion  always  evaluates  to  true.  However  it  also 
insuies  that  all  Knowledge  about  the  value  of  ERRORSTATUS 
becomes  uncertain.  This  is  used  to  indicate  that  the  value  of 
ERRORSTATUS  becomes  unknown  following  a data  base 
accp's. 


Figure  2- 1 7.  Assertions  evaluated  by  LISP. 


Of  the  assertions  in  Figure  2-17,  the  first  four  are  most  •important  as  their 
evaluation  will  spawn  other  goals  and  will  eventually  result  in  addiliona'  code 
generation.  The  second  four  assertions  (STAT««  through  LITERAL**)  rre  very 
simple  and  reluin  true  or  false  depending  on  fne  value  of  their  single 
parameter.  The  next  three  assertions  (all  have  BIND  as  the  first  four 
characters)  are  u'ed  to  bind  variables  to  values  extracted  from  lists.  These 
assertions  are  necessary  because  of  the  list  structures  contained  in  the 
assertions  generated  by  tlu'  Request  Handler. 


UfSJDEF^SlArjOirjG  DATA  STRlCTUfJES 
2.7  A',  .f'rtion'j 


87 


The  last  throe  i>ct.crftons  .^-e  unn5u;il  m th.it  they  arc  not  evaluated  (or  truth  or 
failure.  Irrtcad,  they  return  a value  or  change  the  state  as  is  explained  in 
Figure  ?-l  7 

Occasionally  the  prc-conctition  of  a rule  will  include  standard  LISP  or 
Micro-Planner  predicates  (sec  (McCarthy  et  at  1972]  and  [Sussman  and 
Winograd  1972]).  These  predicates  are  illustrated  in  Figure  2-18. 

Finally,  Figure  2-19  gives  an  alphabetical  index  of  the  assertions  and  pre  cates 
in  Figures  2 8 and  2-13  thru  2-18. 


TMSETO(A.n) 

Sets  Micro-Planner  variable  A to  the  value  of  0. 

TMASVAI.(A) 

True  if  variable  A is  bound  to  a value 

NULL(A) 

A is  null. 

CAP(A> 

Returns  the  first  element  of  the  list  A. 

CDR(A) 

Returns  a list  equivalent  to  A with  its  first  element  removed. 


Figure  2-18,  Standard  LISP  and  Micro-Planner  predicates  used 
in  the  rules. 


understanding  data  structures 

2.7  Aotcrlion'.,. 


ASSEimON 

FIG. 

ASSERTION 

FIG. 

Aca  RI 

2-M 

Gf  f RUNT 

2-16 

AM  i OR 

2-16 

HIERARCHYGROUP 

2-13 

ALIIORFULINST 

2-16 

INirVARSnn 

2-17 

ANYOUIPUT 

2-14 

LlTRALu* 

2-17 

DC  Alt 

2-15 

INAREA 

2-13 

DlNDirMLnn 

2-17 

INCORE 

2-14 

BlNDlYPim 

2-17 

ISITEM 

2-15 

UlNDVALim 

2-17 

iSPORT 

2-8 

ISVAR 

2-8 

DlMMAii 

2-15 

LINKED 

2-16 

C 

2-14 

LINKS 

2-8 

CAI  CKEY 

2-13 

NFXTLEVOUTnn 

2-17 

CAR 

2-18 

CDR 

2-13 

CLOSED 

2-14 

NULL 

2-18 

CONTAINS 

2-13 

OPENED 

2-14 

CONTESTim 

2-17 

PROGRAM 

2-16 

CUfTRENT 

2-14 

READLaa 

2-17 

DDKEY 

2-13 

Rt^PQiMt 

2-17 

DI  TALLVALwx 

2-17 

RETQiii* 

2-17 

DETVAL 

2-16 

STAT|f» 

2-17 

DIVVARSx* 

2-17 

STOPPED 

2-14 

DOCATION 

2-16 

TEST 

2-15 

EQu 

2-15 

THASVAL 

2-18 

FOR 

2-8 

THSETQ 

2-18 

FOUND 

2-14 

TOnEOPDD 

2-8 

FOUNDFIRST 

2-14 

TOBEUSED 

2-8 

FOUNDNEXT 

2-14 

UNCERTERRSATaa 

2-17 

FOUNIXIWNER 

2-14 

UPLINKED 

2-16 

FOUNDUSING 

2-14 

cs 

2-15 

GETPORT 

2-16 

Fipuie  2-19.  Index  to  Figures  2-8  and  2-13  through  2-18. 


UNDERSTANDING  DATA  STRUCTURES 


89 


2.8  A F>.;irnc  fpx  the  Sc>’nnntics  of  Dnla  ^tructi'fLS. 

2.8.1  fUilo':.  of  type  SI. 

Now  that  all  of  the  assertions  and  predicates  have  been  described,  it  is  possible 
to  discuss  the  frame  tliat  (in  the  logic  of  programs)  forms  an  axiomatic 
reprcsentalior  of  the  semantics  of  data  structures  and  DML  procedure.  The 
rules  of  the  Frame  which  describe  simple  operators  (type  Si)  will  be  discussed 
first.  Each  of  these  rules  is  of  the  form  P(A)R  where  A is  a single  program 
operation  or  command.  The  post-condition  R will  be  one  or  more  of  the 
assertions  of  Figure  2-lA. 

Each  rule  in  I igure  2-20  is  described  with  A,  the  operation,  followed  in  order  by 
P,  the  pre-condition,  and  R,  the  posl-condilion.  The  post-conditiop  Is  further 
identified  by  the  preceeding  Variables  in  each  rule  are  underscored. 

Operator  rules  are  actually  fairly  simple,  and  only  two  will  be  discussed  in  detail. 
The  next  to  last  rule  of  Figure  2-20,  GET  REC,  is  extremely  simple  and  an  English 
interpretation  is: 

If  is  CURRENT  in  a particular  program  segment  identified 
by  t EVF.L  then  executing  GET  REC  will  insure  that  REC  is 
INCORE  and  available  for  processing  by  the  particular 
program  segment. 

FIND  REC  RECORD  is  somewhat  mO'-e  complicated  and  the  following  interpretation 

will  ignore  some  of  the  assertions.  In  English: 

If  is  a port  and  IJM  is  a calculated  *<.ey  and  is  required  to 
be  equal  to  VAI.  then  edher  determine  interactively,  or  if 
VAt  IS  not  equa'  to  RUNTIME  simply  move  VAL  to  ITM. 


UNDERSTANDING  DATA  STRUCTURES 

2.8  A Frame  for  the  Semantics  of  Data  Structures 


90 


Following  the  FIND,  assert  the  record  has  been  FOUND  and  is 
CURRENT  but  not  INCORE.  Also  indicate  that  the  value  of 
ERRORSTATUS  is  uncertain. 

Note  the  equivalence  between  this  rule  and  the  <FIND>  production  earlier 
discussed. 


r 


UNDERSTANDING  DATA  STRUCTURES 

2.8  A Frame  for  thf,*  Semantics  of  Data  Structures 


91 


MOVE  VAl  TO  DEST. 


ACCEPT  VAR. 


DISPLAY  ITML 


OPEN  ^EAS. 


CLOSE  AREAS. 


ISVARfDEST)vlSlTEM(DEST) 

C(DEST.VAL) 


[isn  em(varmsvar(var)1a 
ANYOUTPUT((EV<READL**  TTM  REL)).VAR) 

ACCEPT(VAR.ITM.RELVC(VAR.RUNT1ME) 


-NULL(IIML) 

ANYOUTPUT(ITML.LEVEL) 

TOBEOPNDfAREAS) 

OPENEtlKAREAS) 

TODEOPND(AREAS) 
CLOSED!  AREM) 


FIND  REC  RECORD. 


1SP0RT(REC)aCALCKEY(REC.1TM)aF0R(F0RLIST.LEVEL)a 
fTHASVAL(VAt  )vDINDVAL« »((EV(CAR  F0RL1ST)).1TM)1a 
r-^VAL.RUNTIME)vACCEPT(lTM.lTM.EQ)lAC(lTM.VAL) 

FOUND(REC.ITM.VAL.LEVEL)aCURRENT(REC.LEVEL)a 

-lNCORE(REC.LEVEL)AUNCERTERRSTAT»it() 


UNDERSTANDING  DATA  STRUCTURES 

2.8  A Framo  for  the  Semantics  of  D.  fa  Structures 


92 


FIND  FIRST  R^  RECORD  OF  yf^  TYP. 

f--(TYP.SET)AHIERARCHYGROUP(OWM.REC.UNfT)A 
f»(OWN.SYSTEM)vCURRENT(OWrj.LEVELX)nv 
f •=(TYP.AREA)aINARE  A(KEC.UNIT  )1 

FOUNDriR$7(TYP.REC.UNIT.'L.EVEL)ACURRENT(REC.LEVF.L)A 

^INCOf^HRF.C.lEVEL)AUNCERTERRSTATo»() 

FIND  NEXT  REC  F^ECORD  OF  UNIT  TYP. 

fr-(TYP.AREA)AlNAREAvF:EC.UNIT)1v 
f^(TYP.SET)AHIERARCMYGROUP(OWN.REC.UNIT)A 
f»-(OWN.SYSTEM)vCURRENT(OWN.LEVELX)11  . 

F0UNDNEXT(TYP.REC.UN1T.I.FVFL)aCURRENT(REC.LEVEL)a 

-INCOF^E(REC.LEVEL)AUNCEinERRSTAT>»«() 

FIND  OWNER  RECORD  OF  §1  SET. 

HIERARCHYGROUP(OWN.MEM.ST)a^.=(OWN.SYSTEM)a 

CURRENTfMEMXEVaX) 

FOUNDOWNER(OWN.ME.VI.ST.LEVEL)aCURRENT(OWN.LEVEL)a 

MNCORE(OWN.LEVEI)aUNCERTERRSTAT«kO 

GET  REC. 

CURRENT(RFC.LEVEL) 

1NC0RF(RFC.LEVEL) 

STOP. 

-STOPPED(NAM) 

STOPPEDfNAM) 


Figure  2-20.  Operator  (type  SI)  rules. 


2.8.2  Iteration  qi!c  - type  ^ 

Figure  2-21  contains  tlie  single  iterative  rule  used  by  the  DMLP.  The  code 
generated  by  this  rule  matches  that  produced  by  <LOOP  C0NTR0L>  and  <LOOP 


LiNDERSTANDING  DATA  STRUCTURES 
2.8  A Frtm*  for  the  Semantics  of  Data  Structures 

BODY>.  The  rule  is  described  with  a pre-condition,  loop  invariant,  Iteration  v 

step.  Iteration  test  and  a post-condition.  The  pre-condition  and  iteration  test 

correspond  'o  AOOP  CONTROLS  The  loop  Inva'iari  and  iteration  step  Is 
r**-  . 

.captured  in  the  <LOOP  BODY>.  The  variable  Y1  as  used  in  the  example  of 
Figure  2-12  is  specifically  introduced  by  the  system  to  insure  the  loop  Invariant. 

The  only  post-condition  of  the  Iteration  Rule  is  a LINKED  assertion.  This  means 
that  a loop  is  constructed  for  the  purpose  of  accessing  the  GREC  record,  e.g.. 

The  goal  which  Invokes  the  Iteration  Is  to  find  an  access  path  from  PREC  to 
GREC  and  then  perform  the  ACTION.  Since  a single  loop  traverses  a single 
hierarchy  it  will  establish  a path  from  PREC  to  REC. 

If  REC  is  equal  to  GREC,  the  access  path  has  been  completed.  If  not,  the  rule  j 

can  be  recursively  entered  to  complete  the  path  from  REC  to  GREC.  Such 
recursive  entry  of  the  rule  will  result  in  nested  loops.  The  STAYUP  rule 
(described  below)  insures  an  end  to  the  recursinn  by  checlving  If  PREC  and  GREC 
are  the  same,  in  which  case  the  access  patt  has  been  completed. 

(CURST AT,A.BiO  i*  * function  which  is  expanded  to  "CURRENCY  STATUS  FOR  A 

a". 


! 


UNDERSTAMDINn  DATA  STF?UCTURES 

2.8  A Frame  for  the  Semantics  of  Data  Structures 


PRE-CONDITION: 

mr;i)iYP«tt(PRjaA 

[l[-(PPECjAREAMtIERARCHYGROlJP(PRf;C,GPLCW 

TilSrJO(RECjGiLEC)]vFllERARCHYGROUP(PHEC,REC,UNIl)]^ 

rOUNDF  tt;f^T~(fYP.PEC.UN,T.LEVEL)ADOKEY(REC,DBJ<)A 
C(DDK.(CURSTAT.UNIT.TYP.O)) 

INVARIANT; 

CfOnK.ClJRRV) 

ITERATION  STEP; 

I INK!  n<ACTinN.REC.GREC.t.EVEL.ARl.AR2)A 
C(DBK.(CURSTAT.UNfTJYP.CORRV))A 

FQUNDUSlNG(REC.DDK.LEVEL)Ar  niJNONEXT(TYP.REC.UNlT.lEVEl) 


LOOP  TERMINATOR: 

C0N1ESTmii((EV<CAR  ACT10N)).AR1,AR2) 


POS  r-CONDITION: 

I INI<m(ACTION.PREC.GREC.LEVEL.ARl,AR2) 


Figure  2-21.  The  iteration  rule. 


UNDERSTANDING  DATA  STRUCTURES 

2.8  A Frame  lor  fho  Semantics  of  Data  Structures 


TOPPROG 


L1NKS(ACT10N.P0RT.GREC.LEVEL)a0PENED(AREAS)aGETP0RT(PREC)a 

L1NKED(ACT1ON.PREC.GREC.LEVEL.N!L.N1L)aCL0SED(AREAS)a 

STOPPED(NAM) 


PORTDEF 


STAYUP 


PROGRAM(NAM) 


flSP0RT(RrC)AF0UND(REC.ITM.VAI..X)1v 
f-^ISPORT(DREC)ALINKS(DUl.PORT.GPFC.X)A 
THSETQ(R~EC.SYSTEM)aUPUNK(GREC.REC)aV 
f-'!SPQRT(DR'EC)ATHSET(XREC.ARE>  x)] 

■»  GEIPORT(REC) 


rf-(REC.GREC)ACURRENT(REC.IEVEU1v 
[-(RLCjSYSTEM)a«(RE£AREA)a 
UPLINKED!  REC.GREC.LEVEUnA 
[-((CV(CAR  ACTION)).ALL)vC((EV(CADR  ACTION)).  1)1a 
rr^FOR(FORLiri.LEVEL)ADOACTiQN(ACTION.LEVEL)1v 
[FOR(FOR11S.LFVEL)a 

ALLFOftFULINST(ACriON.FORLlS.LEVEL)n 


L1NKED(ACTI0N.REC.GREC.LEVEL.AR  1 .AR2) 


UPHIERARCHY 


FOUNDQWNERI  GREC.MEM.ST.LEVEL  )v 

rrOUNDOWNER(OWN.MEM.ST.lEVEL)AUPLlNKED(OWN.GREC.LEVEL)1 

UPHNKED<MEM.GREC.LEVEL) 


ALLFORFULINSTDEF 

Ai  LFQR(ACTION.(EV(CAR  FORL1S)).LEVEL.DUMMY)a 
ALLF0RFUL1NST(ACTI0N.(EV(CDR  F0RL1S)).LEVEL'/ 

-♦  ALLFORFULINST(ACTION.FORL1S.LEVEL) 


Figure  2-22  is  continued  on  the  next  page. 


UNDERSTANDING  DATA  STRUCTURES 

2.8  A Fram*  for  the  Semantics  of  Data  Structures 


96 


ALLFORDEF 

[NULL(EQBLIS)ADOACTION(ACTION.LEVEL)ATHSETQ(DUMMY.Qnv 
[THSET(X1TM1.(EV(CADAR  F0RLIS)))a 

THSETQ(LEVEL  1 .(EV(CADDR('CAR  F0RL1S))»a 
THSETQ(ITM2.(EV(CADP(CDDAR  F0RL1S))))a 
THSETQ(LEVEL2.(EV(CADDR(CDDAR  F0RL1S))))a 
THSETQ(^,{EV{CAAR  F0RUS)))a 
DE  rVAL(ITM2.LEVEL2.ARG2.NlL.NlLU 
DETVALOIMl  .LEVELl.ARG~l.ARG2.REnA 
TEST((EV(LIST  ARGl  ARC2)).LEVEL)a 
ALLF0R(ACT10N.(EV(CDR  F0RL!S)).LEVEL.DUMMY) 

*♦  ALLF0R(ACT10N.FQRL1$.LEVEL.DUMMY) 

DETVALDEF 

THSETQ(ARG.ITM)a 

[LITRAL«m»(ITM)v 

[-(lJ{i^RUNTIME)ANEWVAR(ARG)AGETRUNT(ARG.ARG2.REL)V 
[CALCKEY(^ITM)a1SP0RT(REC)a-(8£L,EQ)a 
F0UND(REC.ITM.ARG2.LEVEL)a 
EQ  (ERRORSTATUS,0)]v 
[1S1TEM(ITM)aC0NTA1N$(REC.1TM)a 
INC0RE(REC.LEVEL)1v 

[[STAT<M»atM)vREPQ«»(ITM)vRETQ««t(lTM)lA 
L1NK$(ACTI0N.PREC.GREC.LEVEL)a 
[-BCA«(1TM)vC({EV(CADR  ACTION)), 0)1a 
r-RETQ«M<(ITM)vC((EV(CADR  ACTI0N)).'0)1a 
[-(IIM,ONE)vC«EV(CADR  ACTI0N)).0)1a 
hBTMMAa(ITM)vrTOBEUSED(ITMS.LEVEL)A 
INITVAR$e«t{ITM$)nA 
THSETQ(^{EV(CADR  ACTI0N)))a 
LINKED(ACT10N.PREC.GREC.LEVEL.ARG.ARG2)a 
[«(U[M,AVE)v 

DIWARS«m*((EV(CADR  ACTlQN)).ITMS)m 
-♦  DETVAL(ITM.LEVEL.ARG.ARG2.REL) 


Figure  2-22  is  continued  on  the  next  page. 


i* 


k 


UNDERSTANDING  DATA  STRUCTURES 

2.8  A Frame  for  the  Semantics  nf  Data  Structures 


GETRUNTDEF 


EQ  (VAR.H1GM-VALUES)aACCEPT(VAR.1TM.REL) 
GFTRUNT(VAR.1TM.REL) 


DOACTIONDEF 


TUSETCKCOMMD.(EV(CAR  ACT10N)))a 
1HSETCKARG.(EV(CADR  ACTION)))a 
r-REPO«nt(COMND)v 

rTnnFUSED(lTMS.LEVEI-)AOETALLVAL««(IlMS,NiL)A 

GIND1TML«»i(1TMS)a 

[NULL(lIl^)vANYOUTPUT(IIMI^LEyEL)]A 

NEXTLEVOUT««(1IMS)]a 

[i-(COMNPjALL)vC(ARaO)]A 

[^^(c6MNb.ANY)vC(ARG.  1 )]a 

[--(COMND,ONE)vC(ARG,l  )]a 
hBCA«(COMND)vC(ARG.(ADDl  ARG))]a 
r^BTMMA«(COMND)vfTOBEUSED(lTMS.LEVEL)A 
DFTAI I VAL««(lTMS.ACTION)n 


DOACTlONfACTIONiEVEL) 


FREETOGOUP 


L INKS(DU1.DU2.REC.LEVEL)a 

[fCURRENT(REC.LEVEL)AUPLINKED(REC.GREC.LEVEL)]v 

[CIJRRENT(XREC.LEVEL)aUPLINKED<XREC.GREC.LEVEL)]1 


CURRENT(GREC.LEVEL) 


CURAX 


-NUl.L((EV(CAR  LEVEL)))aCURRENT(REC,(EV(CAR  LEVEL))) 


CURRENT(REC.LEVEL) 


ITEMAX 


DBKEY(REC,lIM)vCONTAlNS(REC,L[M) 


-♦  ISITEM(ITM) 


Figure  2-22.  Type  S3  and  S4  rules. 


f 


UNDERSI ANTING  DATA  STRUCTURES 

2.8  A Frnmo  for  the  Semantics  of  Data  Structures 


98 


2.8.3  lypo  S3  and  S^  t ules. 

In  Figure  2-  22  each  rule  is  labeled,  the  pre-condition  Is  described  and  then  the  j 

post-condifion  is  given  preceeded  by  a Four  of  these  rules,  ALLFORDEF,  | 

ALLFORFULINSTDEF,  UPHIERARCHY,  and  CURAX  are  recursive.  | 

j 
I 

2.8.3. 1 The  STAYUP  rule. 

The  STAYUP  rule  checks  for  completion  of  an  access  path.  The  first  disjunction 
in  the  pre  condition  of  STAYUP,  consisting  of  the  first  four  assertions,  insures 
that  either  REC  and  GREC  are  the  same  and  current  or  code  has  been  generated 
linking  REC  to  GREC  with  FIND  OWNER  operations  only. 

The  next  disjunction  of  two  assertions  forces  the  quantification  flag  for  ALL 
(only  if  such  quantification  has  been  spec  'ied  as  the  ACTION)  to  be  temporarily 
false.  If  the  condition  test  succeeds  then  the  Hag  will  be  reset  to  t ue.  At  the 
end  of  the  iteration,  the  flag  will  have  its  proper  value. 

The  next  disjunction  of  STAYUP  (four  assertions)  specifies  Immediate 
const  uction  of  the  code  for  the  specified  ACTION  if  the  activity  is  not 
conditional.  If  the  activity  is  conditional  then  ALuFORFULINST  will  insure 
generation  of  the  code  for  conditional  testing  as  well  as  actur.dy  performing  the 
action.  STAYUP  is  equivalent  to  the  <ACT10N>  production. 

L Rather  than  describing  all  of  the  remaining  rules  in  detail.  Figure  2-23  gives 


UNDFRG TANDING  DATA  STRUCTURES  93 

2.8  A Frame  for  the  Semantics  of  Data  Structures 


correspondences  between  the  productions  of  Figure  2-11  and  the  rules. 
However,  informal  descriptions  of  several  of  the  rules  follow. 


RULE 

PRODUCTION 

t'oFM’ROG 

<MAIN  PARAGRAPH> 

PORIDEF 

<PORT> 

STAYUP 

<ACTION> 

uphh;rarchy 

<FINDOWNER  SEQUENCE>  in  <ACTION> 

ALLFORFULINSTDEF 

ALLFORDEF 

<CONDITION  TEST> 

DETVALDEF 

<VALUE  DETERMINATION>,<CONDlTION  VALUE> 

GETRUNTDEF 

<RUNTIME> 

DOACTIONDEF 

<DISPLAY  SEQUENCE>,<STATISTIC  SEQUENCE> 

FREETOGOUP 

<KINDOWNER  SEQUENCE>  in  <VALUE  DETERMINATION> 

Figure  2-23. 

Rule  and  production  correspondences. 

i 2.S.3.2  Ihe  DOACTIONDEF  rule. 

DAOCTIONDf.F  contains  many  disjunctions  of  the  form  -•AvB  which  of  'course  has 
the  same  effect  as  A=B. 


ACTION  is  a list  consisting  of  a command  and  a variable  which  is  available  for  use 
in  satit’yi'g  command.  The  first  assertion  extracts  the  command  from 
ACTION  and  ctores  it  in  CQMND.  The  second  assertion  extracts  the  variable  and 
stores  it  in  V^. 


The  first  disjunction  (incorporating  the  next  seven  assertions)  specifies  that  if 


r 


UNDERSTANniNG  DATA  STRUCTURES 

2.8  A Franif!  for  the  Semantics  of  Data  Structures 


the  COMND  is  a report  quantifier,  then  generate  the  following,  in  orcfer: 

(a)  Code  to  determine  the  values  for  all  ITMS  to  be  used  in 
this  program  segment  (DETALLVAL##). 

(b)  Code  to  display  the  matrix  (ANYOUTPUT). 

(c)  Code  to  handle  all  sub-levels  within  the  context  of  the 
current  level  (NEXTLEVOUT##). 

The  next  three  disjunctions  (six  assertions)  cause  code  to  be  generated  to  set 
ARG  to  zero  or  one  if  the  COMND  is  a quantifier.  The  next  to  the  tact  disjunction 
(three  assertions)  causes  ARG  to  be  incremented  by  one  if  the  COMND  is  a COUNT 
or  AVE. 


The  last  disjunction  in  DOACTIONOEF  causes  code  generation  for  processing  the 
ITMS  matrix  according  to  the  statistical  command  given  in  COMND.  This  code  will 
either  total  all  of  the  columns  of  the  matrix  or  find  the  minimun  or  maximum 
entry  in  each  column. 


2.S.3.3  Ihe  ALlFORDEF  ruje, 

ALLFORDEF  is  another  important  rule.  This  is  a recursive  rule' which  controls 
code  generation  for  a conjunctive  list  of  tests  (contained  in  FORLIS).  Each 
iteration  of  the  rule  will  result  in  code  generation  tor  one  such  test.  The  first 
three  assertions  of  ALLFORDEF  will  (a),  terminate  Ihe  recursion  when  all  tests  in 
FORLIG  have  been  taken  care  of  (FORLIS  is  empty)  and  (b),  cause  code 
generation  for  the  conditional  ACTION  associated  with  FORLIS. 


understanding  data  structures 

2.8  A Frame  for  the  Semantics  of  Data  Structures 


101 


1 

1 

i 


L 


r 

t 


[ 


If  FORLIS  is  not  empty  then  the  next  five  predicates,  all  THSETQ,  extract  the  five 
parameters  describing  llie  first  test  In  FORLiS.  These  fiye  parameters  were 
described  in  section  2.5.2.  The  next  two  assertions  in  ALLFORDEF,  both 
DETVAL,  insure  code  generation  for  tne  value  determination  of  the  two 
arguments  wliich  are  to  be  compared  in  the  test.  The  next  assertion  causes 
insertion  of  a conditional  statement  if  the  truth  value  of  the  relation  is  not 
known.  Iho  last  assertion  causes  the  recursion  and  Insures  the  proper  code 
generation  for  the  remaining  tests  in  FORLIS. 

Section  2.5.2  staled  that  a complete  condition  is  actually  a disjunctive  list  of  lists 
of  test  conjunctions,  yet  ALLFORDEF  is  apparently  only  processing  a conjunctive 
list  of  tests.  That  is  why  ALLFORFULINSTDEF,  a'^cthcr  recursive  rule.  Is 
Included.  This  rule  is  entered  once  'or  each  list  of  conjunctions,  e.g.  once  for 
each  disjunct  in  the  complete  condition.  In  other  words,  ALLFORFULINSTDEF 
passes  the  first  list  of  tests  (CAR  FORLIS)  to  ALLFORDEF  and  then  recursively 
enter;,  itself  with  the  remaining  lists,  e g.  (CDR  FORLIS). 

2.8.3.A  The  CURAX  axiom. 

CURAX  is  a simple  axiom.  It  is  interesting  because  it  is  recursive  and  uses  the 
Dewey*decimal  structure  of  •Iv'^L.  CURAX  specifies  that  a record  is  CURRENT 
at  a particular  level  if  this  level  Is  withir  the  context  of  another  level  In  which 
the  record  is  already  CURRENT.  In  other  word.,  to  find  out  If  a record  is 
current  rt  X.2.1,  CURAX  will  see  \i  it  is  current  at  X.2.  (CAR  LEVEL)  returns  the 


d 


u 

f 


"3 


I 


S 


UNOERSl  AWING  DATA  STRUCTURES 
2.8  A Frame  for  the  Semantics  of  Data  Structures 

value  X.?  because  X.2.1  is  represented  in  a tist  structure  as  ((X.2).l). 

2.8.3.5  Ihe  POrUDEF  cylc. 

It  was  broiioht  out  earlier  that  there  are  three  methods  of  entry  to  a DBTG  data 
base.  An  entry  point  is  called  a port.  The  physical  beginning  of  an  area  is  a 
port,  as  is  a record  which  can  be  directly  accessed.  Finally,  DBTG  lets  the  user 
define  sols  which  are  owned  by  the  SYSTEM.  What  this  really  amoun*s  to  is 
that  the  data  management  system  stores  the  direct  access  key  of  the  first 
record  of  such  a set  in  an  internal  table.  A SYSTEM  owned  set  is  therefore 
directly  accessible  and  can  be  used  as  a port.  Since  a set  or  area  can  contain 
more  than  one  record,  a program  loop  is  required  to  access  all  of  Ihe  records  in 
the  set  or  area. 

PORTDEF  contains  three  disjunctions  in  its  precondition,  one  disjunction  lor  each 
type  of  port.  The  type  of  port  selected  is  preferred  by  PORTDEF  in  the 
following  order:  direct  access,  SYSTEM  owned  set,  or  Area  search.  The  first 
two  assertions  will  be  true  if  a direct  access  is  possible,  and  code  has  been 
constructed  for  this  purpose.  The  second  disjunct  is  true  only  if  the  top  level 
record  of  the  query  (GREC)  is  directly  or  indirectly  owned  by  Ihe  SYSTEM.  If 
neither  of  Ihe  two  preferred  ports  can  be  used  then  an  Area  search  is  forced 
by  setting  REC  to  the  value  AREA. 


PORTDEF  only  generates  code  if  a loop  is  used  as  a port.  The  code  generation 


I UNDERSTANDING  DATA  STRUCTURES  103 

2.8  A Framo  for  Ihe  Semanlics  of  Data  Structures 


r 


i 


! 


[ 


for  other  ports  occurs  when  t.ie  Program  Writer  attempts  to  satisfy  the  LINKED 
goal  following  the  GETPORT  goal  in  TOPPROG.  Satisfying  this  goal  will  generate 
the  code  for  the  access  path  from  PREC  to  GREC.  Recall  that  GREC  Is  the 
top-level  context  record  and  that  PREC  has  been  bound  by  GETPORT  to  one  of 
three  values:  the  name  of  the  direct  access  record,  the  keyword  SYSTEM  or  the 
keyword  AREA.  Note  that  even  if  a direct  access  port  Is  used  it  need  not  be 
the  top  level  record  of  the  query. 


} I 


1 


> \ 

s 


7 


UNDERSTANDING  DATA  STRUCTURES 


2.9  Efficioncv  Coir-idc  rat  ions, 

2.9.1  Code  duplication. 

The  prograrnG  generated  by  the  system  are  far  from  optimal.  The  generated 
programs  are  frequently  larger  than  necessary  because  of  the  repetition  of 
equivaicnl  progran  blocks.  This  occurs  if  a particuiar  block  is  needed  in  more 
than  one  location  in  the  program. 


I 


2.9.2  Data  characteristics. 

Program  construction  aiso  occurs  in  the  absence  of  any  information  about  data 
characteristics  such  as  data  volumes  and  distributions.  As  a resuit,  arbitrary 
choices  are  frequently  made  where  allernalive  program  constructions  are 
possible.  One  such  arbitrary  decision  which  can  impact  program  efficiency 
depending  on  unknown  (to  the  system)  characteristics  is  port  seiection. 


The  system  prefers  to  use  a SYSTEM  owned  set  as  a port  rather  than  an  area 
search  if  a direct  access  port  is  not  possibte.  Yet,  if  one  of  the  target  records 
Occurs  reialiveiy  infrequently  in  a (physically)  small  area,  then  the  area  search 
is  preferred. 


An  area  search  may  also  be  preferred  if  the  aiternative  SYSTEM  owned  set 
"meanders"  through  the  data  base.  Searching  such  a set  can  require  more 
physicai  accesses  tiian  searching  the  entire  area  containing  that’  set.  This  may 
occur  if  the  unit  of  physical  access,  called  a page,  contains  more  than  one 


T 


UNDERSTANOING  DATA  STRUCTURES 
2.9  Effiripncy  Considerations. 


105 


record.  In  Fip,ure  2-24  each  large  box  is  a page  of  an  area,  and  each  "X"  is  a 
target  record  of  the  query.  All  of  the  target  records  are  contained  in  a system 
owned  set  as  illustrated  by  the  arrows.  It  would  require  a total  of  seven  page 
accesses  to  traverse  the  set,  yet  an  area  search  would  require  only  four 


accesses. 


r 

i 


[ i 


Figure  2-24.  Area  search  vs  a SYSTF.M  owned  set  search. 


2.9.3  Conjunction  testinp.. 

Although,  the  system  ignores  efficiency  factors  in  most  of  its  decisions,  there 
are  some  instances  in  which  efficiency  considerations  are  reflected  In  the 

I 

programs  ronsiructed.  One  such  instance  is  exhibited  in  the  construction  of 
tests  for  conjunclions.  Rather  than  testing  the  entire  conjunction  in  a single  IF 
statement,  each  conjunct  is  tested  separately.  Although  this  may  lead  to  a 


I 


E 

r. 

r 

f. 

k 

I 

i 

[ 


UNOrRSTANDING  DATA  STRUCTURES 
2.9  Efficiency  Conoidcrationc. 

somewhat  larf>pr  program  than  necessary,  the  separation  of  conjuncts  will  lead 
to  a more  efficient  execution. 

The  increased  efficiency  results  because  separating  the  tests  minimizes  the  j ^ 

number  of  test  preparations  that  will  be  necessary  to  evaluate  the  complete 
condition.  Hecause  all  tests  are  part  of  a conjunction,  the  i-th  test  Is  on|v 
necessary  if  the  proceeding  (i  - !)  tests  jre  all  true.  Since  a test  preparation 
can  involve  many  data  base  accesses  (in  the  calculation  of  a statistic,  for 
example),  minimizing  such  tost  preparations  can  have  a dramatic  effect  on 
program  oxeciition  time  and  accesses  to  secondary  storage. 

2.9. A Qrdcriiip.  of  tests. 

The  system  currently  generates  test  evaluations  in  the  order  they  are 
presented  in  the  query.  Re-ordering  these  tests  according  to  cost  of  the 
necessary  preparation  (lowest  first)  and  expected  failure  rate  (highest  first) 
would  further  increase  execution  efficiency. 

2- 9-5  Early  termination  of  PERFORM. 

Increased  efficiency  in  program  execution  also  results  from  the  code  generated 
by  the  system  as  described  by  the  <SYSTEM  TEST>  production.  The  inclusion 


of  a <SYSItM  rt.ST>  in  a <CONTROL  TEST>  is  fiecessary  only  if  the  loop  is 
controlled  by  the  ONE  quantifier.  The  <SYSTEM  TEST>  will  terminate  execution 
of  the  loop  after  one  line  of  the  matrix  has  been  displayed.  The  use  of 


UNDERSTANOINfl  DATA  STRIiCT'JRtG  107 

2.9  Efficiency  Consieferafions. 

<SYSTEM  TEST>  with  the  logical  quantifiers  is  not  necessary  but  strongly 
suggested  since  there  is  no  reason  to  continue  execution  of  the  loop  as  soon  as 
existential  (ANY)  quantification  has  been  proved  or  universal  (ALL)  quantification  i 

has  been  disproveef.  An  example  of  ‘his  use  ran  be  seen  In  Figure  2-12.  \ 

i 

] 

A system  lost  can  also  be  used  to  terminate  the  execution  of  loops  which  | 

( 

calculate  a COUNT,  a MINima  or  a MAXima,  if  the  result  of  the  calculation  is  to  be 
used  in  a test.  This  is  possible  because  the  value  of  such  a calculation  will  vary 
monotonic  ally  during  execution  of  a loop.  The  value  of  » COUNT  or  a MAX 
always  Increases.  The  value  of  a MIN  will  only  get  smaller.  An  AVErage  will 
never  behave  monotonically  unless  values  are  first  sorted,  and  a TOial  is 
monotonic  only  if  its  possible  set  of  arguments  contains  numbers  which  are  all  of 
the  same  sign. 

Therefore,  if  the  value  which  is  ^>eing  calculaled  in  the  loop  Is  to  be  compared 
with  a Known  value,  the  loop  can  be  terminaled  if  the  COUNT  or  MAX  Is  greater 
(or  the  MIN  is  less)  than  the  Known  value.  Note  that  the  actual  relation  used  in 
the  test  is  not  used  to  determine  the  relation  used  in  the  system  test.  The 
effect  of  the  system  test  can  be  made  stronger  in  certain  cases  depending  on 
the  relation  in  which  the  statistic  Is  to  be  tested.  For  example,  If  the  test  is 
"COUNT  LT  10",  then  the  system  test  can  be  the  stronger  "COUNT  GE  10"  rather 
tha-  the  slancfard  "COUNT  GT  10", 


J 


UNDERSTANDING  DATA  STRUCTURES 
2.9  Efficiency  Conr.ideralions, 


DMLP  generates  the  '"standard"  system  test.  The  simple  extension  to  generate 


stronger  tests  would  definilely  *oe  warranted  in  a "production"  implementation. 


In  determining  the  values  of  "A"  and  ”Q"  in  (A  REL  B),  the  system  first 


determines  the  value  of  "B",  then  the  value  of  "A".  This  permits  early  loop 


termination  in  determining  the  value  of  "A"  because  the  value  of  "B"  is  Known. 


Improving  DIvILP  so  that  it  can  choose  which  of  "A"  or  "B"  to  evaluate  first  would 


permit  early  loop  terminalion  when  determining  the  value  of  "B"  as  well. 


1 


T 


UNDERSTANDING  DATA  STRUCTURES 


109 


I 2.10  ALGOL  TO  COHOL  Conversion. 

51 

' ‘ Programo  construe. ed  by  the  APG  system  are  written  in  an  Algol-likc  language. 

The  DMLP  (which  uses  tlie  APG)  must  translate  this  to  COBOL  anj  DML. 

'} 

A- 

Algol  [Naur  ct  at  I960],  as  its  .name  suggests,  is  an  algorithmic  language.  The 
Algol-likc  language  generated  by  the  APG  system  uses  labels  very  sparsely: 
program  labels  are  only  referenced  at  disjunctive  branches  in  the  program.  At 
such  a point  of  disjunction,  the  APG  system  generates  a conditional  branch  to  a 
subroutine  which  encompasses  all  alternatives  but  one.  Then  code  generation 
continues  for  the  excluded  allernative.  The  current  stale  and  the  gos!  for  the 
sub-routines  are  saved.  After  completely  generating  the  main  program,  the 
system  recalls  the  goal  and  state,  and  commences  program  generation  of  the 
subroutines. 

i 

The  conversion  process  from  the  Algol-like  language  t''ansforms  each  such 
subroutine  to  a sub-section  of  the  COBOL  orogram.  The  main  routine  becomes 
the  main  section  of  the  program.  This  explains  the  distinction  in  the  BNF 
description  of  section  2.6  between  <MAIN  SECTI0N>  and  <SUB  SECTIONS  The 
conditional  branch  in  the  COBOL  program  is  effected  with  a PERFORM  of  the  first 
jjaragraph  in  the  sub-section.  The  first  paragraph  in  the  section  is  performed 
(rather  tlian  the  section  itself)  to  insure  that  program  control  will  not  flow 
sequentially  out  of  the  first  paragraph. 


UNDERSTANDING  DATA  STRUCTURES 

2.10  Algol  to  COUOL  Conversion,  ( 

i 

The  need  for  sub-sections  only  arises  if  the  query  contains  a disjunctive 
condition.  The  sub-section  is  entered  only  if  *be  first  disjunct  fails.  The  first 
step  in  the  sub-section  tests  the  second  disjunct.  DMLP  generates  a branch  to 
a second  sub-section  if  there  is  a third  disjunct  in  the  condition.  If  there  are  no 
further  disjuncts  in  the  condition,  then  DMLP  inserts  the  clause  NEXT 
SENTENCE"  in  place  of  the  conditional  branch. 

i 

Labels  are  used  in  the  Algol-like  programs  only  to  transfer  program  control  to 

. I 

sub-routines.  All  program  control  within  a routine  is  accomplished  via 
appropriate  nesting  of  program  blocks.  Although  COBOL  allows  nesting  of  IF  s, 
the  nested  IP’s  give  ? program  a messy  appearance,  much  less  clear,  in  fact, 

I 

than  the  equivalent  in  Algol,  Furthermore,  COBOL  does  not  allow  the  nesting  of 
blocks  which  may  be  repetitively  executed.  Each  such  block  must  be  either  a 
named  paragraph,  a named  section  or  a contiguous  set  of  named  paragraphs  or 
named  sections.  That  is  not  to  say  that  program  control  cannot  be  logically 
nested;  e.g.,  it  is  perfectly  legal  in  COBOL  to  repetitively  execute  a block  from  a 
block  which  is  being  repetitively  executed.  The  point  is  that  in  COBOL  the 
nesting  is  not  physical  in  the  program  text;  instead,  it  is  accomplished  by 
labeling  the  blocks  and  then  using  the  labels  as  the  object  of  a PERFORM. 

i 

I 

Transformation  from  Algol  structure  to  COBOL  structure  requires  ciianging  a 
nested  block  s'.-'jcfiire  to  a sequentially  labeled  block  structure.  The  process  of 


this  transformation  is  simple. 


UNDERSTANDING  DATA  STRUCTURES 
2.10  Algol  In  rODOL  Ccnvertion. 


ill 


r 

I 


f 


1 

h 

\ 

\ 

} 


The  complolr.  Algol-IIKe  program  Is  stored  in  a LiSP  list  structure  In  tiie  obvious 
v^ayi  e.g.|  each  nested  segment  forms  a sub-list.  Conversion  proceeds  as 
follows;  translate  each  top  level  element  of  the  list  to  the  appropriate  program 
statement  unless  the  element  Is  itself  a list,  e.g.,  a nested  program  block.  If 
such  a nested  block  is  encountered  then  generate  a new  paragraph  name  and  a 
program  statement  which  performs  the  new  name,  extract  the  sub-list  containing 
the  nested  block  from  the  main  list  and  push  the  new  paragraph  name  followed 
by  each  element  of  the  sub-list  onto  the  end  of  the  main  list.  Further  levels  of 
nesting  are  then  properly  processed  in  due  course  because  each  such  block 
eventually  becomes  a top  level  element  of  the  train  list. 

In  addition  to  structural  differences  between  Algol  and  COBOL,  there  are  also 
syntactic  difference-:,  fhc  syntactic  conversion  is  accomplished  by  a simple 
table  look  op  process  in  a table  of  clause  "masks".  This  process  also  cleans  up 
such  things  as  double  negatives  in  logical  statements.  For  example,  ■’l.E  is 
changed  in  two  steps,  first  to  GT  and  finally  to  "IS  GREATER  THAN  . 

Figure  2-25,  gives  ’he  Algol-like  program  generated  by  the  APG  system  for  the 
c^uery  of  Figure  2-6.  Contrast  this  with  Figure  2-26  which  is  the  COBOL 
equivalent  of  Figure  2-25. 

The  generated  COBOL  would  be  completely  consistent  with  the  specification  in 
CCODASYL  1970]  if  certain  instances  of  the  MOVE  statement  are  replaced  by  a 


(I 


UNDERSTANDING  DATA  STRUCTURES 
2.10  Algol  lo  COOOL  Conversion. 


COMPUTE  slalement.  The  generated  programs  may  also  make  use  of  a MIN  or 


MAX  fund  ion  which  is  not  sfar.dard  in  COBOL. 


UNDERSTANDING  DATA  STRUCTURES 
2.10  Algol  lo  CODOL  Conversion. 


113 


THEiGOAL:  (PROGRAM  P1);1S:ATTA1NABLE:BY:THE:F0LL0W1NG:PR0GRAM: 


PROCl  (PI) 
BEGIN  <- 


OPF.N((AI  A?))j 
FINDFIRSKAREA  DOCTOR  Al); 
DDDKEY  *-  CURST AT(A1  AREA); 
WHILE  -NE  (ERRORSTATUS  0)  DO 
BEGIN-«==^ 


Z2  CURSTAT(A1  AREA); 

GET(DOCIOR): 

D1SI’LAY((D(X:NAME  DOCAGE  specialty)): 
FINDMRSKSET  TREATMENT  TREATING); 
TDBKEY  CURSTAKfREATING  SET); 
WH’LE  -’NE  (ERRORSTATUS  0)  DO 

BEGIN  < — 

Z1  ‘ CURSTA1  (TREATING  SET): 


riNrK)WMCR(TREATMENTS'; 
GET(PATIENI): 

IF  •’TEG1((GT  PAT  AGE  (21))  (X 
PROC? 

ELSE 
BEGIN  <- 


D)  THEN 


GET  (TREAT  ME  NT): 

DISPl.AY((PATNAME  PA^aGE  DIAGNOSIS)); 
END  


TDBKEY  ^ Z1 

F1NDUS1NG(TREATMENT  TDBKEY) 
FINDNEXT(SET  TREATMENT  TREATING) 
El'JD  < 


P 

A 

R 

A 


1 

lO 


DDDKEY  Z2; 

FINDUSiNGdXCTOR  DDBKEY); 
FINDNEXT(AREA  DOCTOR  Al): 
END  < 


CL0SE((A1  A2)); 

STOP(Pl); 

< 


END 


Figure  2-2b.  Program  PI  (Figure  2-6)  in  Algol. 


f 


UNDERSTANDING  DATA  S1RUCTURES 
2.10  Algo'  to  COBOL  Conversion. 

THE:G0AL:  (PROGRAM  P1);IS:ATTAINAGIE:BY;THE:F0LL0WING:PR0GRAM: 

PRcr^EDURF  DIVISION. 

PROCl  SECTION. 

PARA- 100. 

OPEN  AREA  Al  A2  . 

FIND  FIRST  DOCTOR  RECORD  OF  Al  AREA  . 

MOVE  CURRENCY  STATUS  FOR  Al  AREA  TO  DDBKEY  . 

PERFORM  PARA-101  UNTIL  ERRORSTATUS  IS  NOT  EQUAL  TO  0 
CIOS^  AREA  Al  A2  . 

STOP  . 

PARA- 1 01. 

MOVE  CURRENCY  STATUS  FOR  Al  AREA  TO  Z2  . 

GET  DOCTOR  RECORD . 

DISPLAY  DOCNAME  DOCAGE  SPECIALTY  . 

FIND  FIRST  TREATMENT  RECORD  OF  TREATING  SET  . 

MOVE  CURRENCY  STATUS  FOR  TREATING  SET  TO  TDBKEY  . 
PERFORM  PARA-102  UNTIL  ERRORSTATUS  IS  NOT  EQUAL  TO  0 
MOVE  Z2  TO  DDBKEY  . 
f INI)  rXX'TOR  USING  DDBKEY  . . 
i INI)  NEXT  DOCTOR  RECORD  OF  Al  AREA  . 

PARA- 1 02. 

MOVE  CURRENCY  STATUS  FOR  TREATING  SET  TO  Z1  . 

FIND  OWNER  RECORD  OF  TREATMENTS  SET  . 

GET  PATIENT  RECORD . 

IF  PAT  AGE  IS  NOT  GREATER  THAN  21  NEXT  SENTENCE 
ELSE  PERFORM  PARA- 103. 

MOVE  Z1  TO  TDBKEY  . 

FIND  1REATMENT  USING  TDBKEY  . 

FIND  NEXT  TREATMENT  RECORD  OF  TREATING  SET  . 

PARA- 103. 

GET  TREATMENT  RECORD  . 

DISPLAY  PATNAME  PATAGE  DIAGNOSIS  . 


Figure  2-26.  Program  PI  in  COBOL. 


UNOERSIANOING  DATA  STRUCTURES 


115 


2.11  Examples  of  Procedure  Generation. 

This  section  concludes  Chapter  2 with  several  exat.'oles  of  procedure 
generation.  For. each  example  the  query  Is  given  followed  by  the  generated 
COBOL  program,  The  specific  features  demenstratert  by  each  query  are  j 

classified  and  indexed  in  Figure  2-27.  Program  PI  of  the  feature  index  is  the 
program  of  Figure  2-26. 

Two  different  data  base  structures  were  used  for  these  examples.  The  first  Is 
for  a community  medical  data  base.  This  data  base  contains  information  on  all 
hospitals,  doctors  and  patients  in  a community.  The  structure  for  this  oata  base 
is  illustrated  in  Figure  2-2B.  The  data  structure  diagrams  use  the  following 
convent'onr.:  Record,  sol,  and  area  names  are  in  upper  case,  item  names  are  in 
lower  case,  and  calculated  Keys  are  underlined.  Area  boundaries  are  Indicated 
with  a dashed  line.  Other  conventions  of  DS  diagrams  were  discussed  In 
Chapter  One. 

Each  record  type  also  has  a working  storage  register  associated  with  It  which 
the  DBTG  calls  a data  base  key.  This  data  base  key  cen  be  used  to  store  the 
data  base  address  of  a record  occurence.  The  data  base  key  for  each  record 


has  been  indicated  in  parenthesis. 


UNDERSTANDING  DATA  STRUCTURES 
2.11  Examples  of  Procedure  Generation. 


116 


PROGRAM: 

CONDITIONS; 

COMPUX 
DISJUICTION 
CONJUNCT  ION 
QUANTIFICATION 
USE  OF  STATISTIC 
DISPLAY 

NESTED  MATRICES 
CONFLUENT  HIERARCHY 
DOWIWARD  MIGRATION 
STAFISIIC 
CONDITIONAL 
"ONE"  COMMAND 
INTERACTION; 

CALCKEY  VALUE 
OTHER  VALUE 
PORT; 

CALCKEY 
SYSTEM  SET 
AREA  SEARCH 
DATA  BASE: 

MEDICAL 

SALES 


Pi  P2  P3 


X 

X 


X 

X 


X 

X 


X 

X 


X 

X 


E4 

X 

X 

X 


X 

X 

X 

X 


E&  P6 


X X 


X X 


X X 


X 

XI 
X! 


X! 


Figure  2-27.  Index  to  the  examples. 


1 


I 

I 

'■  UNDERSTAMDINC;  DATA  STRUCTURES 

2.1 1 Example',  of  Procedure  Generation. 


I 


FiBiiro  2-28.  A community  medical  data  base  structure. 


I 

I 


f 


I 

f 


UNDERSTANDING  DATA  STRUCTURES 
2.11  Examples  of  Procedure  Generafion. 


lid 


MAINA  AREA 


Figure  2-29.  A sales  data  base  structure. 


‘ J 


Figure  2-29  illustrates  the  other  data  base  structure  used  In  the  examples. 
This  data  base  might  be  the  sales  portion  of  an  Inventory  system. 


r 


UNOEPSIANOING  DATA  STRUCTURES 
2.11  Examples  of  Procedure  Generation. 


119 


Program  P2  {( igures  2-30,  2-31)  will  list  (for  a hospital  specified  at  execution 
time)  all  patie'its  who  have  accumulated  a total  uninsured  biliing  of  over  <200. 
Uninsured  bTlings  are  identified  with  one  of  two  billing  codes.  This  disjunction 
leads  to  the  separate  section  called  PR0C3.  P2  illustrates  the  unnece  sary 
generation  of  duplicate  paragraphs  that  may  occur:  PARA-105  Is  identical  to 
PARA-301. 


i 

[ 

I 


k 

1 


Program  P3  (Figures  2-32,  2-33)  lists  the  average  dollar  value  and  average 
number  of  unique  items  per  order  received  for  all  active  salesmen.  The  set  of 
active  salesmen  is  defined  at  run  time  by  giving  the  time  interval  during  which 
sales  activity  must  have  occurred.  P3  makes  extensive  use  of  system  generated 
variables.  XI  is  the  existential  quantifier  flag;  X13  and  Xld  contain  values 
entered  by  the  user  at  run-time.  XA  accumulates  a count  of  the  number  of 
orders  per  salesman.  X7  contains  the  total  sold  by  a salesman.  X6  contains  the 
total  value  of  an  order.  XIO  contains  the  number  of  entries  per  order.  XI 1 
contains  tiie  number  of  entries  per  salesman.  X7  and  XU  are  reset  to  contain 
averages  in  the  last  three  statements  of  PARA-103. 

Program  Pd  (Figures  2-3T,  2-35)  identifies  possible  unhappy  customers  (more 
than  three  unfilled  orders  received  prior  to  an  interactively  specified  date)  and 
especially  flags  "important  customers",  those  whose  recent  orders  exceed 
110,000. 


I 


UNDERSTANDING  DATA  STRUCTURES 
2.11  Examples  of  Procedure  Generation. 


Program  P5  (Figures  2-36,  2-37)  simpiv  Hots  any  specified  order  and  Indicates 
which  items,  if  any,  cannot  be  filled  from  Inventory.  This  program  makes  heavy 
use  of  downward  migration,  especially  In  PARA-iOl. 

Program  P6  (Figures  2-38,  2-39)  is  another  one  using  the  medical  data  base.  It 
generates  a doctor’s  cross  reference:  for  a particular  doctor,  each  patient  and 
all  of  each  patient’s  doctors  are  listed.  It  is  for  programs  like  these,  which 
traverse  a confluent  hierarchy  in  two  directions,  that  conservation  of  the  loop 
Invariant  becomes  important.  The  loop  Invariant  Is  the  current  of  set. 


The  report  generated  by  P6  will  list  the  top  level  doctor  in  many  locations.*  at 
the  beginning  of  the  report  and  with  each  patient  (because  he  is  one  of  the 
doctors  associated  with  each  of  his  patients).  This  Is  a minor  deficiency  but  if 
can  be  cured  with  a simple  extension  to  the  DMLP  Providing  for  the  definition 
of  temporary  variables  would  allow  the  user  to  differentiate  between  doctors  at 
different  levels  of  the  query.  This  would  be  accomplished  by  specifying  the 
storage  of  the  doctor’s  number  In  a temporary  variable  at  the  top  level  of  the 
quijry;  e.g.  SAVE  DOCNO  IN  TDCK).  Then  retrieval  at  the  third  level  would  be 
specified  to  be  conditional  on  (DOCNO  NE  TDOC). 


UNDERSTANOING  DATA  STRUCTURES 
2.1 1 Examplr:;  of  Procedure  Generation. 


121 


ENTER  PROGRAM  NAME  F>2 
READ  DSK:?  T 

PRIMARY  RECORD  (MAIN) 

♦HOSPITAl 

COHOn IONS  LQR  RETRIEVAL 
♦(HOSPNO  10  RUNTIME) 

♦NIL 

IT-EMr.  DR  SIMS  TQ  13E  DISPLAYED 

♦itOSt'NAME 

♦UOSPNO 

♦REPEAT 

PRIMARY  RECORD  (REPEAT) 

♦PAT  K N1 

CONDITIONS  FOR  RETRIEVAL 
♦(TOT  GE  200) 

~ PRIMARY  RECiIRD  (TQT) 
♦Dii.LENTRY 

CONDIT  IONS  FOR  RETRIEVAL 
♦(CODE  EO  ’X’) 

♦OR 

♦((X)DE  EO  'D 
♦Nil 

ilLM.??.  OR  STATS  FOR  TOT 

£AK^)ijNT 

♦Nil 

♦nFl 

. LTCMS  or  STATS  LQ  0E  DISPLAYED 
♦PATNAME 
♦PAT  NO 
♦NIL 

♦nFl 

POSSIBLE  PORTS  ARE: 

(HOSPNC)) 

SELECT  ONE  OR  TYPE  NIL  UOSPNO 


Figure  2-30.  Query  P2:  Tor  a hospital  specified  at  run-time,  list  its  name  and 
number  and  the  name  and  number  of  all  patients  whose  total  billings  of  code  X 
or  ’Z’  exceeds  or  equals  1200." 


I 


I, 


UNDERSTANDING  DATA  STRUG  ftRES 
2.1 1 Exatnplos  of  Procedure  Generation. 


THE;GOAL:  (PROGRAM  P2):IS;ATTA1NABLE:BY:THE:F0LL0W1NG:PR0GRAM: 

PROCEDURE  DIVISION. 

PROCl  SECTION. 

PARA- too. 

OPEN  AREA  A1  A2  . 

DISPLAY  ’HOSPNO’  ’EO?’ . 

ACCEPT  HOSPNO  . 

FIND  HOSPITAL  RECORD  . 

IF  ERRORSTATUS  IS  NOT  EQUAL  TO  0 NEXT  SENTENCE 
ELSE  PERFORM  PARA- 101. 

CLOSE  AREA  A1  A2  . 

STOP  . 

PARA- 101. 

GET  HOSPITAL  RECORD . 

DISPLAY  HOSPNAME  HOSPNO  . 

FIND  FIRST  PATIENT  RECORD  OF  PATSET  SET  . 

MOVE  CURRENCY  STATUS  FOR  PATSET  SET  TO  PDBKEY  . 
PERFORM  PARA-102  UNTIL  ERRORSTATUS  IS  NOT  EQUAL  TO  0 

PARA-102. 

MOVE  CURRENCY  STATUS  FOR  PATSET  SET  TO  Z2  . 

MOVE  0 TO  X5  . 

FIND  FIRST  BILLENTRY  RECORD  OF  BILLINGS  SET  . 

MOVE  CURRENCY  STATUS  FOR  BILLINGS  SET  TO  BDBKEY  . 
PERFORM  PARA- 103  UNTIL  ERRORSTATUS  IS  NOT  EQUAL  TO  0 
IF  X5  IS  LESS  THAN  200  NEXT  SENTENCE 
ELSE  PERFORM  PARA-IOA. 

MOVE  12  TO  PDBKEY  . 

FIND  PATIENT  USING  PDBKEY  . 

FIND  NEXT  PATIENT  RECORD  OF  PATSET  SET  . 

PARA- 1 03. 

MOVE  CURRENCY  STATUS  FOR  BILLINGS  SET  TO  Z1  . 

GET  BILLENTRY  RECORD . 

IF  CODE  IS  NOT  EQUAL  TO  ’X’  PERFORM  PARA-300 
. ELSE  PERFORM  PARA- 105. 

MOVE  Z1  TO  BDBKEY  . 

FIND  Bill  ENTRY  USING  BDBKEY  . 

FIND  NE-"  BILLENTRY  RECORD  OF  BILLINGS  SET  . 

PARA- 105. 

MOVE  X5  + AMOUNT  TO  X5  . 


UNDERSTANmMG  DATA  STRUCTURES 
2.11  Example?  of  Procedure  Generation. 


123 


i 


I 


PARA-IOd. 

Of  f PA'iIENT  RECORD  . 
DISPl  AY  PATNAME  PATNO  . 


PROC:3  SECTION. 


PARA -300. 

ir  CODE  IS  NOT  EQUAL  TO  T NEXT  SENTENCE 
ELSE  PERFORM  PARA -301. 


I 

I 

I 

I 


PARA -301. 

MOVE  XT)  + AMOUNT  TO  X5  . 


1 


I 


UNDERSTANDING  DATA  STRXTURES 
2.1 1 Examplo',  of  Proredore  Generation. 

ENTER  IWKiRAM  NAME  P3 
READ  DSK:?  T 
P^MARY  record.  (MAIN) 

♦SALESMAN 

CO_NDnTONG  f OR  RETRIEVAL 
♦ ANY 

PRIMAJjy  RECORD  (ANY) 

♦ORDEf? 

CONDITIONS  FJ)ii  RETRIEVAL 
♦(ORD-DATE  GE  RUNtTmE) 
♦(ORD-DATE  LE  RUNTIME) 

♦NIL 

♦NIL 

1LEM5  f)f?  5IA_7.5.  TO  BE  DISPLAYED 
♦MAN- NAME 
♦DIST  W 
♦AVE 

PRIMARY  RECORD  (AVE) 

♦ORDER 

CONDITIONS  EOR  RETRIEVAL 
♦NIL 

rrLMG  OR  STATS  LOR  AVE 
♦TOT 

primary  RL0.QRQ  (TOT) 

♦ENIRY 

CfL'IOmONS  EOR  RETRIEVAL 
♦NIL 

ITfMr.  OR  STATS  EOR  TOT 

♦xcosf 

iNIl 

♦COUNT 

PrMMAftY  RECORD  (COUNT) 

♦ENIRY  

CONDI  I IONS  FOR  RETRIEVAL 
♦ NIL 

♦nFl 

♦NIL 


Figure  2-32.  0>-icry  P3:  'Tor  those  salesmen  who  have  any  orders  wilh  an 
order  dale  Rial  falls  helwcen  two  dates  specified  at  run-time,  display  his  name, 
district  nutxher,  the  average  cost  per  order,  and  the  average  number  of  items 
per  order." 


UNDERSTANDING  DATA  STRUCTURES 
2.11  EvamplcG  of  Procedure  Generalion. 


125 


THE:GOAL;  (PROGRAM  P3):IS:ATTAINAOLE:BY:THE:FOLLOWING;PROGRAM: 


PROCmilRE  DIVISION. 

PROCI  SECTION. 

PARA -(00. 

01 M N AREA  MAINA  . 

riMD  riRST  SALESMAN  RECORD  OF  MAINA  AREA  . 

MOVE  CURRENCY  STATUS  FOR  MAINA  AREA  TO  KEY2  . 

PERFORM  PARA- 101  UNTIL  ERRORS'!  ATUS  IS  NOT  EQUAL  TO  0 . 
CLOSE  AREA  MAINA  . 

Slop . 

PAIM-IOl. 

MOVE  CURRENCY  STATUS  FOR  MAINA  AREA  TO  Z5  . 

MOVE  0 TO  XI  . 

FIND  r IRST  ORDER  RECORD  OF  SALES  SET  . 

MOVE  CURRENCY  STATUS  FOR  SALES  SET  TO  KEY3  . 

PERFORM  PARA- 102  UNTIL 

( ERR0RGTATU3  IS  NOT  EQUAL  TO  0 ) OR 
( X I IS  f OUAL  TO  1 ) . 

IF  XI  IS  NOl  EQUAL  TO  1 NEXT  SENTENCE 
( LSE  I’ERFORM  PARA- 103. 

MOVE  Z5  TO  KEY2  . 

riMI)  SALESMAN  USING  KEY2  . 

FIND  NEXT  SALESMAN  RECORD  OF  MAINA  AREA  . 

PARA-  102. 

MOVE  CURRENCY  STATUS  FOR  SALES  SET  TO  Z1  . 

I(  XI3  IS  NOT  EQUAL  TO  HIGH-VALUES  NEXT  SENTENCE 
FI  SE  PERFORM  PARA-IOA. 

G(  ( ORDER  RECORD 

II  ORO- DATE  IS  LESS  THAN  X13  NEXT  SENTENCE 
n.SE  PERFORM  PARA-105. 

MOVF  ZI  TO  KEY3  . 
r (ND  ORDER  USING  KEY3  . 

FIND  NEXT  ORDER  RECORD  OF  SALES  SET  . 

PARA-IOA. 

DISPLAY  ’ORD-DATE’  ’GE?’ . 

ACCEf’T  X13  . 

PARA- 105. 

IF  X 1 A IS  NOT  EQUAL  TO  HIGH-VALUES  NEXT  SENTENCE 
ELSE  PERFORM  PARA- 106. 

IF  ORD-DATE  IS  GREATER  THAN  XI 4 NEXT  SENTENCE 
ELSE  PERFORM  PARA- 107. 

PAF^A-106. 

DISPLAY  ’ORD-DATE’  ’LE?’ . 

ACCEF’T  XI 4 . 

PARA- 10  7. 


UNDERSTANDING  DATA  STRUCTURES  126 

2.U  Examples  of  Procedure  Generation. 


MOVE  1 TO  XI  . 

PARA- 103. 

GET  SALESMAN  RECORD  . 

FIND  OWNER  RECORD  OF  SALESMEN  SET  . 

GET  DISTRICT-HOR  RECORD . 

MOVE  0 TO  X4  . 

MOVE  0 TO  X7  . 

MOVE  0 TO  XU  . 

FIND  FIRST  ORDER  RECORD  OF  SALES  SET  . 

MOVE  CURRENCY  STATUS  FOR  SALES  SET  TO  KEY3  . 

PERFORM  PARA-110  UNTIL  ERRORSTATUS  IS  NOT  EQUAL  TO  0 . 
MOVE  X7  DIVIDED  BY  X4  TO  X7  . 

MOVE  X 1 1 DIVIDED  BY  X4  TO  XI 1 . 

DISPLAY  MAN-NAME  DIST-NO  X7  XI 1 . 

PARA- 110. 

MOVE  CURRENCY  STATUS  FOR  SALES  SET  TO  Z4  . 

MOVE  X4  + 1 TO  X4  . 

MOVE  0 TO  X6  . 

FIND  FIRST  ENTRY  RECORD  OF  ORDERLIST  SET  . 

MOVE  CURRENCY  STATUS  FOR  ORDERLIST  SET  TO  KEY6  . 
PERFORM  PARA-111  UNTIL  ERRORSTATUS  IS  NOT  EQUAL  TO  0 . 
MOVE  X7  + X6  TO  X7  . 

MOVE  0 TOXIO. 

FIND  first  entry  RECORD  OF  ORDERLIST  SET  . 

MOVE  CURRENCY  STATUS  FOR  ORDERLIST  SET  TO  KEY6  . 
PERFORM  PARA-112  UNTIL  ERRORSTATUS  IS  NOT  EQUAL  TO  0 . 
MOVE  XU  + XIO  TOXll  . 

MOVE  Z4  TO  KEY3  . 

FIND  ORDER  USING  KEY3  . 

FIND  NEXT  ORDER  RECORD  OF  SALES  SET  . 

PARA-111. 

MOVE  CURRENCY  STATUS  FOR  ORDERLIST  SET  TO  Z2  . 

GET  ENTRY  RECORD  . 

MOVE  X6  + XCOST  TO  X6  . 

MOVE  Z2  TO  KEY6  . 

FIND  ENTRY  USING  KEY6  . 

FIND  NEXT  ENTRY  RECORD  OF  ORDERLIST  SET  . 

PARA- 11 2. 

MOVE  CURRENCY  STATUS  FOR  ORDERLIST  SET  TO  Z3  . 

MOVE  XIO  + 1 TOXIO  . 

MOVE  Z3  TO  KEY6  . 

FIND  ENTRY  USING  KEY6  . 

FIND  NEXT  ENTRY  RECORD  OF  ORDERLIST  SET  . 


Figure  2-33.  Program  P3., 


UNDERSTANDING  DATA  STRUCTURES 
2.11  Examples  of  Procedure  Generation. 


128 


1^ 


THE:GOAL:  (PROGRAM  P4):lS;ATTAINABLE:BY:THE:FaLOWlNG:PROGRAM;  | 

PROCEDURE  DIVISION.  ! 

PROCI  SECTION. 

PARA- 100.  (I 

OPEN  AREA  MAINA  . f 

FIND  FIRST  CUSTOMER  RECORD  OF  CUSTOMERS  SET  . | 

MOVE  CURRENCY  STATUS  FOR  CUSTOMERS  SE""  TO  KEY4  . ' 

PERFORM  PARA- 101  UNTIL  ERRORSTATUb  IS  NOT  EQUAL  TO  0 . 

CLOSE  AREA  MAINA  . 

STOP  . 

PARA-lOl. 

MOVE  CURRENCY  STATUS  FOR  CUSTOMERS  SET  TO  Z5  . ; 

MOVE  0 TO  XI  . I 

FIND  FIRST  ORDER  RECORD  OF  ORDERS  SET  . 

MOVE  CURRENCY  STATUS  FOR  ORDERS  SET  TO  KEY3  . 

PERFORM  PARA-IC  ;NTIL 

( ERRORSTATUS  13  NOT  EQUAL  TO  0 ) OR 
( XI  IS  GREATER  THAN  3 ) . 

IF  XI  IS  NOT  GREATER  THAN  3 NEXT  SENTENCE 
ELSE  PERFORM  PARA- 103. 

MOVE  Z5  TO  KEY4  . 

FIND  CUSTOMER  USING  KEY4  . 

FIND  NEXT  CUSTOMER  RECORD  OF  CUSTOMERS  SET  . ! 

PARA-102. 

MOVE  CURRENCY  STATUS  FOR  ORDERS  SET  TO  Z1  . 

IF  X 15  IS  NOT  EQUAL  TO  HIGH-VALUES  NEXT  SENTENCE 
ELSE  PERFORM  PARA- 104. 

GET  ORDER  PECORD  . 

IF  ORD-DATE  IS  NOT  LESS  THAN  X 15  NEXT  SENTENCE 
ELSE  PERFORM  PARA-105. 

MOVE  Z1  TO  KEY3  . 

FIND  ORDER  USING  KEY3  . 

FIND  NEXT  ORDER  RECORD  OF  ORDERS  SET  . 

PARA-104. 

DISPLAY  ’ORD  DATE’  ’LT?’ . 

ACCEPT  X15. 

PARA- 105. 

IF  SHIP-DATE  IS  NOT  EQUAL  TO  0 NEXT  SENTENCE 
ELSE  PERFORM  PARA-106. 

PARA- 106. 

MOVE  XI  + 1 TO  XI  . 

PARA-103.  i 

GET  CUSTOMER  RECORD . 

DISPLAY  CUST-NAME  . 

MOVE  0 TO  X3  . . 


J 


UNDERSTANDING  DATA  STRUCTURES 
2.11  Examples  of  Procedure  Generation. 


129 


!-• 


FIND  FIRST  ORDER  RECORD  OF  ORDERS  SET  . 

MOVE  CURRENCY  STATUS  FOR  ORDERS  SET  TO  KEY3  . 

PERFORM  PARA- 107  UNTIL  ( ERRORSTATUS  IS  NOT  EQUAL  TO  0 ) OR 
( X3  IS  EQUAL  TO  1 ) . 

MOVE  0 TO  XI 1 . 

FIND  FIRST  ORDER  RECORD  OF  ORDERS  SET  . 

MOVE  CUf^RENCY  STATUS  FOR  ORDERS  SET  TO  KEY3  . 

PERFORM  PARA-1 10  UNTIL  ERRORSTATUS  IS  NOT  EQUAL  TO  0 . 

IF  XI I IS  NOT  GREATER  THAN  10000  NEXT  SENTENCE 
ELSE  PERFORM  PARA-111. 

PARA- 1 07. 

MOVE  CURRENCY  STATUS  FOR  ORDERS  SET  TO  Z2  . 

FIND  OWNER  RECORD  OF  SALES  SET  . 

FIND  OWNER  RECORD  OF  SALESMEN  SET  . 

GET  DISTRICT-HDR  RECORD  . 

GEl  SALESMAN  RECORD  . 

DISPLAY  DIST-NO  MAN-NAME  . 

MOVE  I TO  X3  . 

MOVE  Z2  TO  KFY3  . 

FIND  ORDER  USING  KEY3  . 

FIND  NEXT  ORDER  RECORD  OF  ORDERS  SET  . 

PARA- 1 10. 

MOVE  CURRENCY  STATUS  FOR  ORDERS  SET  TO  ZA  . 

FIND  FIRST  ET-TRY  RECORD  OF  ORDERLIST  SET  . 

MOVE  CURRENCY  STATUS  FOR  ORDERLIST  SET  TO  KEYS  . 

PERFORM  F’ARA-1 12  UNTIL  ERRORSTATUS  IS  NOT  EQUAL  TO  0 . 
MOVE  ZA  TO  KEY3  . 

FIND  ORDER  USING  KEY3  . 

FIND  NEXT  ORDER  RECpRD  OF  ORDEF  S SET  . 

PARA- 11 2. 

MOVE  CURRENCY  STATUS  FOR  ORDERLIST  SET  TO  Z3  . 

GET  ENTRY  RECORD  . 

MOVE  XI 1 + XCOST  TO  XU  . 

MOVE  Z3  TO  KEY6  . 

FIND  ENTRY  USING  KEY6  . 

FIND  NEXT  ENTRY  RECORD  OF  ORDERLIST  SET  . 

PARA- 1 II. 

DISPLAY  ’IMPORTANT’  ’CUSTOMER!’ . 


I 


rr 


1 


UNDERSTANDING  DATA  STRUC^-«ES 
2.11  Examples  of  Procedure  Generation. 


ENTER  PROGRAM  NAME  P5 
READ  D$K:?  T 
PRIMARY  RECORD  (MAIN) 

♦ORDER 

CONDITIONS  FOR  RETRIEVAL 
♦(ORD-NO  EQ  RUNTIME) 

♦NIL 

ITEMS  OR  STATS  TO  BE  DISPLAYED 

♦ORD-NO 

♦ORD-DATE 

♦DIST-NO 

♦MAN-NAME 

♦CUST-NAME 

♦CUST-ADDRESS 

♦REPEAT 

PRIMARY  RECORD  (REPEAT) 

♦ENTRY 

CONDITIONS  FOR  RETRIEVAL 
♦NIL 

FtEMS  or  stats  IQ  BE  DISPLAYED 

♦ ITEM-NAME 

♦QUANT 

♦COST 

♦XCOST 

♦COND 

CONDITIONS  FOR  RETRIEVAL 
♦(QUANT  GT  QOH) 

♦NIL 

FtEMS  or  stats  IQ  BE  DISPLAYED 
♦’STOCKOUT!’ 

♦’ONLY’ 

♦QOH 

♦’AVAILABLE’ 

Fnil 

♦nFl 

♦NU. 

POSSIBLE  PORTS  ARE: 

(ORD-NO) 

SELECT  ONE  OR  TYPE  NIL  ORD-NO 


i 


Figure  2-36.  Query  P5:  Tor  the  run-time  order,  display  the  order  number, 
date,  district  number  salesman  name,  customer  name,  and  customer  address. 
Also  display  the  name,  quantity  ordered,  cost  and  extended  cost  of  every  item 
on  the  order.  If  the  order  cannot  be  filled,  display  the  amount  actually  available 
In  the  message  ’stocKout!  only  --  available’." 


UNDERSTANDING  DATA  STRUCTURES 
2.11  Examples  of  Procedure  Generafion. 


131 


I 


\ 


THE:GOAL:  (PROGRAM  P5);IS:ATTA1NA0LE.3Y:THE:F0LL0W1NG:PR0GRAM: 

PROCEDURE  DIVISION. 

PROCI  SECTION. 

PARA -1 00. 

Of’EN  AREA  MAINA  . 

DISPl  AY  ’ORD-NO’  ’EQ?’ . 

ACCEPT  0 ?D-N0  . 

FIND  ORDER  RECORD  . 

IF  ERRORSTATUS  IS  NOT  EQUAL  TO  0 NEXT  SENTENCE 
ELSE  f'ERFORM  PARA-101. 

CLOSE  AREA  MAINA  . 

SI  OP  . 

PARA-IOI. 

GEl  ORDER  RFCORD . 

FIND  OWNER  RECORD  OF  SALES  SET  . 

FIND  OWNER  RECORD  OF  SALESMEN  SET  . 

GET  DIS7RICT-HDR  RECORD . 

GET  SALESMAN  RECORD  . 

FIND  OWNER  RECORD  jf  ORDERS  SET  . 

GET  CUSTOMER  RECORD  . 

DISPLAY  ORD-NO  ORD-DATE  DIST-NO  MAN-NAME  CUST-NAME 
CUST-ADDRESS . 

FIND  FIRST  ENTRY  RECORD  OF  ORDERLISE  SET  . 

MOVE  CURRENCY  STATUS  FOR  ORDERLIST  SET  TO  KEYS  . 
PERFORM  PARA- 102  UNTIL  ERRORSTATUS  IS  NOT  EQUAL  TO  0 . 
PARA- 102. 

MOVE  CURRENCY  STATUS  FOR  ORDERLIST  SET  TO  Z1  . 

FIND  OWNER  RECORD  OF  SOLD  SET  . 

GET  INVENTORY  RECORD  . 

CiET  ENTRY  RECORD  . 

DISPLAY  ITEM-NAME  QUANT  COST  XCOST  . 

GET  INVENTORY  RECORD  . 

GET  FNl  RY  RECORD  . 

IF  QUANT  IS  NOT  GREATER  THAN  QOH  NEXT  SENTENCE 
ELSE  PERFORM  PARA-103. 

MOVE  Z1  TO  KEY6  . 

FIND  ENTRY  USING  KEY6  . 

FIND  NEXT  ENTRY  RECORD  OF  ORDERLIST  SET  . 

PARA- 103. 

DISPLAY  ’STOCKOUT!’  ’ONLY’  QOH  ’AVAILABLE’  . 


i 

\ 


I 


Figure  2-37.  Program  P5. 


1 


d 


UNDERSTANDING  DATA  STRUCTURES 
2.11  Examples  of  Procedure  Generation. 


ENTER  PROGRAM  NAME  P6 
READ  DSK:?  T 

PRIMARY  RECORD  (MAIN) 

♦DOCTOR 

CONDITIONS  FOR  RETRIEVAL 
♦(DOCNO  EQ  RUNTIME) 

♦NIL 

. OR  STATS  10  OE  DISPLAYED 
♦DOCNO 
♦DOCNAME 
♦REPEAT 

PRIMARY  RECORD  (REPEAT) 
♦PATIENT 

CONDITIONS  FOR  RETRIEVAL 
♦NIL 

FtEMS  or  stats  10  0E  DISPLAYED 
r-  I iPATNO 

'•  iPAlNAME 

, ^REPEAT 

k . ~ 


P 

L*- 


PRIMARY  RECORD  (REPEAT) 
♦DOCTOR 

CONDITIONS  FOR  RETRIEVAL 
♦NIL 

triMS  OR  STATS  IQ  BE  DISPLAYED 
♦DOCNO 
♦DOCNAME 
♦NIL 
♦NIL 
♦NIL 

POSSIBLE  PORTS  ARE: 

(DOCNO) 

SELECT  ONE  OR  TYPE  NIL  DOCNO 


132 


I 


Figure  2-38.  Query  P6:  "Display  the  name  and  number  of  a doctor  specified  at 
run-time.  Also  display  the  name  and  number  of  all  of  his  patients,  and  for  each 
patient  display  the  name  and  number  of  all  of  his  doctors."  I 


I 


UNDERSTANDING  DATA  STRUCTURES 
2.11  Examples  of  Procedure  Generation. 


THE:GOAL:  (PROGRAM  P6):1S:ATTA1NABLE;8Y:THE:F0LL0WING;PR0GRAM: 

PROCEDURE  DIVISION. 

PROCl  SECTION. 

PARA- 100. 

OPEN  AREA  A1  A2  . 

DISF’LAY  ’DOCNO’  ’EQ?’ . 

ACCEPT  DOCNO  . 

FIND  D(XTOR  RECORD  . 

IF  ERRORSTATUS  IS  NOT  EQUAL  TO  0 NEXT  SENTENCE 
ELSE  PERFORM  PARA-101. 

CLOSE  AREA  A1  A2  . 

STOP  . 


PARA-101. 

GET  DOCTOR  RECORD  . 

Display  oocno  ixxname  . 

FIND  FIRST  treatment  RECORD  OF  TREATING  SET  . 

MOVE  CURRENCY  STATUS  FOR  TREATING  SET  TO  TDDKEY  . 
PERFORM  PARA- 102  UNTIL  ERRORSTATUS  IS  NOT  EQUAL  TO  0 


PARA- 102. 

MOVE  CURRENCY  STATUS  FOR  TREATING  SET  TO  Z2  . 

FIND  OWNER  RECOI?D  OF  TREATMENTS  SET  . 

GET  PATIENT  RECORD  . 

DISPLAY  PATNO  PAT  NAME  . 

FIND  FIRST  TREATMENT  RECORD  OF  TREATMENTS  SET  . 

MCJVE  CURF?tNCY  STATUS  FOR  TREATMENTS  SET  TO  TDBKEY  . 
PERFORM  PARA- 103  UNTIL  ERRORSTATUS  IS  NOT  EQUAL  TO  0 
MOVE  Z2  TO  TDBKEY  . 

FIND  TREATMENT  USING  TDBKEY  . 

FIND  NEXT  TREATMENT  RECORD  Qf  TREATING  SET  . 

PARA- 103. 

MOVE  CURRENCY  STATUS  FOR  TREATMENTS  SET  TO  Z1  . 

, IND  OWNER  RECORD  OF  TREATING  SET  . 

GET  FXICTOR  RECORD  . 

DISPl  AY  DOCNO  DOCNAME  . 

MOVE  Z1  TO  TDBKEY  . 

FIND  TREATMENT  USING  TDBKEY  . 

FIND  NEXT  TREATMENT  RECORD  OF  TREATMENTS  SET  . 


Figure  2-39.  Program  P6. 


UNDERSTANDING  DATA  STRUCTURES 


13A 


3 DESIGN  QF  DATA  STRICT  URES. 

3.1  Introduction. 

3.1.1  Functional  design. 

The  Data  Base  Task  Group  (DBTG)  [CODASYL  1971a]  describes  a new  job 
function  called  Data  Base  Administrator(DBA); 

"t-lc  will: 

Employ  a data  structure(s)  that  models  the  business  or 
problem...  Assign  names  In  such  a manner  as  to  assure 
their  uniqueness.  Select  search  strategies..." 


In  other  words,  the  DBA  designs  data  structures.  Although  a structure  can  be 
defined  existentially,  as  suggested  by  DBTG,  "...models  the  business...",  such  a 
data  base  may  contain  information  which  will  never  be  used.  In  addition,  the 
quality  of  such  a data  base  is  heavily  dependent  on  the  DBA’s  ability  to 
recognize  the  environment  that  must  be  captured  (modeled)  in  the  data  base.  It 
may  be  much  better  to  define  a structure  functionally,  e.g.  based  on 
anticipated  information  demands  that  the  data  base  must  satisfy. 


In  this  chapter  I will  first  describe  how  the  DBA  can  utilize  the  retrieval 
program  generator  from  *he  previous  chapter  as  an  aid  in  functional  data  base 
design.  Following  this,  I will  describe  an  automatic  data  base  structure 
designer,  hereafter  called  the  designer,  automatic  designer,  or  machine  designer, 
that  can  design  a satisfactory  data  base  structure  for  a set  of  information 
queries.  Since  this  designer  uses  only  a set  of  anticipated  queries  as  input,  the 
ouput  is  a functional  cJesign  rather  than  an  existential  design. 


UNOERSTANiJlNG  DATA  STRUCTURES 
3.1  Introduction. 


135 


3.1.2  RclalJojl  to  programmer. 

The  automatic  data  bace  structure  designer  and  the  automatic  Information 
retrieval  programmer  of  Chapter  2 form  a package.  Both  programs  use  the 
HI-IQ  query  language.  The  automatic  designer  generates  a data  base  structure 
that  can  be  used  as  a data  base  description  for  the  automatic  programmer. 

The  programs  are  also  similar  because  both  translate  from  relational 
descriptions  to  access  path  descriptions. 

3. 1 .3  Limitations. 

This  chapter  is  concerned  with  ttie  design  ”f  data  structures.  Data  structure 
design  is  only  part  of  the  overall  problem  of  data  base  design.  Some  other 
design  Issues,  not  considered  here,  are  decisions  regarding  data  types,  data 
base  subdivision  into  areas,  storage  mcctia,  record  ordering,  record  selcrtion, 
and  data  pioleclion.  Also,  data  characteristics  such  as  relative  volumes, 
volatility  of  data,  etc.,  that  may  affect  data  base  structure  design  decisions  have 
been  ignored  in  this  chapter.  Ignoring  these  design  Issues  and  data 

characteristics  will  never  result  in  data  structures  that  are  Inconsistent  with  the 
set  of  queries,  but  it  may  lead  to  the  design  of  Inefficient  data  structures. 

The  automatic  structure  designer  determ  nes  record  contefit  and  record 
relationships  (sets)  and  also  suggests  inversions  and  items  to  be  used  as 
calculated  Keys.  The  output  of  the  automatic  designer  is  not  equivalent  to  a 


UNDERSTANDING  DATA  STRUCTURES 
3.1  Introduction. 


136  • 


1 

' 

complete  DDL  [CODASYL  1971a]  description  of  a data  base.  Many  statements 

necessary  in  a cOrr.plete  DDL  description  affecting  record  ordering,  set  ordering, 

selection  mode,  data  privacy,  data  pictures,  area  content,  etc.,  are  not  generated  I 

by  the  automatic  data  ha*;e  structure  designer.  | , 


I 

I 1 

; I 

i 


’ I 

t 

1 I 


UNOERSTANOING  DATA  STRUCTURES 


137 


3.2  Use  of  the  ProRrammer. 

The  retrieval  program  generator  described  in  the  previous  chapter  can  be  used 
by  the  DBA  during  the  data  base  design  process.  Since  the  program  generator 
IS  data-free,  program  generation  is  solely  a function  of  the  data  base  description 
and  the  query.  This  "data-freeness"  is  an  important  difference  between 
generali?cd  interpreters  and  automatic  programming.  It  is  one  of  my  reasons 
for  using  automatic  programming  rather  than  designing  a generalized 
interpreter.  (See  section  2.3.)  Data-freeness  also  permits  use  of  the 
programmer  during  the  design  phase  when  a data  base  is  not  yet  available. 

Figure  3-1  .llustrates  the  use  of  a programmer  (human  or  machine)  to  aid  in  data 
structure  design.  The  user  communicates  his  needs  (in  the  form  of  a query)  to 
the  programmer,  the  programmer  obtains  the  proposed  data  base  structure  from 
the  DBA  and  attempts  to  write  a program.  If  the  programmer  Is  unsuccessful  he 
communicates  this  to  the  DBA.  Otherwise,  he  provides  the  DBA  with  the 
program.  In  either  case  the  DBA  can  judge  the  validity  of  the  proposed  data 
base  structure  and  make  appropriate  modifications.  If  any  modifications  are 
made,  then  the  programmer  tries  again  to  create  a program  until  a cycle  is 
completed  in  which  no  data  base  structure  modifications  occur. 


I 


138 


UNDERSTANDING  DATA  STRUCTURES 
3.2  Use  of  the  Programmer. 


Figure  3-1.  A simple  user  - programmer  - DBA  system. 

This  same  process  is  then  repeated  for  another  query  until  all  queries  have 
beeii  translated  to  satisfactory  programs.  Upon  completion  of  this  time 
consuming  process  the  DBA  can  be  reasonably  confident  that  the  proposed 
structure  is  sufficient  for  user  neecs.  Note  that  the  data  base  itself  is  not 
involved  in  this  system.  (Another  feedback  loop  in  the  user-programmer-DBA 
system  is  closed  after  the  data  base  has  been  created  and  the  user  obtains  the 
results  of  the  execution  of  the  programs  associated  with  his  rieeds.  At  this 

I 

point  other  adjustments  become  necessary,  usually  resulting  In  restructuring, 
discussed  in  Chapter  m.) 

This  technique  is  laborious  and  random,  especially  for  the  DBA.  To  make  the 
process  feasible,  other  lines  of  communication  are  necessary  between  the 
programmer  and  the  DBA  (to  report  causes  o^  failure)  and  between  the  users 


t 

t 


f 


’ 'V| 


UNDERSTANDING  DATA  STRUCTURES 
3.2  Use  of  the  Programmer. 

and  the  DBA  (for  naming  conventions).  An  algorithm  for  finding  the  initial  da*a 
base  structure  must  also  be  specified. 

Human  programmers  cannot  be  used  for  this  method  of  design  because  they 
would  take  too  long  and  the  cost  of  writing  the  program'  for  testing  each  design 
would  be  loo  high.  This  restriction  does  not  apply  to  the  machine  programmer 
discussed  in  Chapter  2.  However,  I have  not  designed  a data  base  structure 
using  the  automatic  programmer  in  the  process  described  above,  but 
consideration  of  this  process  led  me  to  automate  the  data  base  structure  design 
process  in  its  entirety.  This  is  the  subject  of  the  next  section. 


3.3  Automatic  Data  Structure  Design. 

3.3.1  Overview. 

The  implemented  designer  generates  a data  structure  design  In  one  pass  as 
Illustrated  in  Figure  3-2.  This  system  is  not  iterative  as  was  the  process 
discussed  for  Figure  3-1. 

Structure  design  requires  two  phases.  In  the  first  phase  all  queries  are 
scanned,  and  for  each  query  a set  of  assertions  is  generated  that  describe  the 
structurai  concepts  of  the  data  base  necessary  for  the  query.  A particuiar 
assertion  is  a constraint  on  the  design,  but  there  may  be  many  designs  wiiich 
satisfy  that  assertion.  The  second  phase  tries  to  find  a design  that  satisfies  oil 
of  the  assertions. 


The  two  phases  are  illust'-ated  in  Figure  3 2 as  follows: 

(a)  Generate  assertions 
(first  three  steps  in  Figure  3-2) 

(b)  Record  and  network  construction 
(last  four  steps  In  Figure  3-2) 

The  program  for  phase  1 is  almost  identical  to  the  Request  Handler  described  in 
Chapter  Two.  Since  the  HI-IQ  language  is  used  by  both  the  designer  and  the 
program  generator,  the  query  interpreter  will  be  the  same  for  both. 
Furthermore,  each  assertion  made  in  the  first  design  phase  corresponds  to  a 
structural  coricept  that  the  program  generator  would  require  In  the  defined  data 


1 


UNDEnSTANDINn  DATA  STRUCTURES 
3.3  Aufomalic  Data  Structure  Design. 


Ml 

structure. 

The  designer  permits  interruption  while  generating  assertions.  This  allows  a 
structure  to  be  designed  from  a subset  of  the  queries  with  a later  continuation 
' of  the  design  process  from  the  point  of  interruption.  The  DB/  can  therefore 

' use  the  system  to  quickly  determine  the  impact  on  the  data  base  structure  of  a 

I particular  subset  of  queries. 

I,; 

||  The  actual  design  process  occurs  in  the  fourth  and  fifth  boxes  of  the  chart  in 

' Figure  3-2.  In  these  steps  the  system  tries  to  derive  a network  structure 

design  that  is  consistent  with  the  assertions. 


UNDERSTANDING  DATA  STRUCTURES 
3.3  Automatic  Data  Structure  Design. 


Figure  3-2.  Data  base  structure  designer!  general  program  flow. 


UNDERSTANDING  DATA  STRUCTURES 


H3 

3.^  QfdiHiEPi  Item  Names. 

Since  ievcr;il  ur.ers  may  be  generating  queries,  there  are  two  problems 
related  lo  itc  n (attribute)  names.  Different  users  might  use  the  same  name  for 
different  attributes  or  they  might  use  different  names  when  referring  to  the 
same  attribute. 

In  addition,  misspelling  can  occur.  A design  system  has  difficulty  with  these 
problems  because  it  has  no  external  corroboration  for  verification  of  names. 

There  are  several  ways  of  handling  this  problem.  I have  chosen  the  last  of 
these  approaches: 

a)  Indicate  Ihc  caveat:  Errors  in  data  base  structure  muy 
result  from  errors  in  naming  iten^s  and  records  or  from  the 
use  of  identical  names  for  different  items  and  records. 

b)  Interactive  corroboration:  Provide  for  interaction 

between  the  machine  designer  and  the  DBA  whenever  a name 
is  enrountered  for  the  first  time  or  in  any  other  situations 
where  errors  might  go  undetected  without  additional 
corroboration. 

c)  DefiniPvc  corroboration:  Require  a definition  list  of 
names  prior  to  beginning  the  design  process. 


I 


4 


UNDERSTANDING  DATA  STRUCTURES 


3.5  Generating  Assertions. 

The  automat'  destgner  first  generates  assertions  regarding  the  data  tasd 
structure  that  is  to  be  designed.  These  assertions  are  used  to  direct  and 
constrain  the  eventual  design  process. 


The  assertions  made  in  this  phase  concern  item  and  record  relationships. 
Although  the  query  may  contain  commands  such  as  COUNT,  ANY,  REPEAT,  etc., 
these  commands  are  ignored  except  to  note  that  the  record  mentioned  in  the 
context  of  such  a command  must  be  hierarchically  beiow  the  record  mentioned 
prior  to  that  command.  Similarly,  tha  designer  makes  no  d'stinction  between  the 
"CONDITIONS  FOR  RETRIEVAL"  and  the  "ITEMS  OR  STATS"  Inputs  for  each 
record.  Both  imuts  are  simply  scanned  for  names  which  are  not  commands. 
These  names  are  ther  checked  against  the  master  name  list  and  If  found, 
assume  T I.  be  ileins. 


That  the  designer  ignores  these  commands  shou'd  give  some  inf  '"jllon  of  its 
limitations.  A belter  system  would  consider  some  of  the  aforementioned  usages 
to  (for  example/  design  an  ordered  set. 


Assertions  are  made  wherever  the  programmer  would  have  checked  for  the 
presence  of  certain  relationships  in  the  defined  structure.  There  are  only  three 
such  assertions.  Th?se  are  lifted  along  with  their  meanings  in  Figure  3-3. 
Succeeding  paragrar>hs  will  discuss  tne  generation  of  these  assertions;  It  should 


r^‘'hmnriii  • 


I’l 

I. 


I ^ 


UNDERSTANDING  DATA  STRUCTURES 
3.5  Gcncriiting  Aiccrliono. 


1A5 


be  noted  that  duplicate  assertions  are  never  generated. 


ADOVE  (A,D) 

Record  A is  hierarchically  above  record  B. 

INORADOVE  U,A.I3) 

Item  I must  be  conlamecl  in  record  A or  in  a record 
hierarchically  above  record  A unless  records  A and  0 form 
a confluent  hierarchy,  *n  which  case  item  1 must  be 
contained  in  or  above  the  record  which  forms  the  base  of 
the  confluent  hierc  woy. 

CALCPORT  (I) 

Item  I has  been  used  in  an  equivalency  test  and  may 
therefore  be  suitable  for  use  as  a calculated  Key. 


Figure  3-3.  Assertions  and  Iheir  meanings. 


3.5. 1 Ihc  ADOVE  assertion. 

An  ABOVE  assertion  is  generated  for  every  record  name  encountered  in  a query 
except  for  the  lop  level  record  of  the  query.  Thus  if  A is  the  top  level  record 
and  0 and  C are  second  level  records  in  a particular  query,  then  two  asse'^t'ons, 
ADOVE(A.n)  and  ADOVE(A,C)  are  made.  If  D is  a third  level  record  and  is 
mentioned  within  the  second  level  context  of  record  B then  the  third  assertion 
made  is  AHOVEIB.D),  The  actual  design  process  will  treat  these  assertions  as 
cons'.rainls  so  that  ABOVE(A,B)  can  be  satisfied  in  one  of  three  ways: 

a)  Records  A and  0 are  top  level  members  of  the  same 
confluent  hierarchy 

b)  Existing  hierarchies  provide  an  upward  path  from  record 
B to  record  A (transitivity) 


UNDERSTANDING  DATA  STRUCTURES 
3.5  Generatinp,  Assertiono. 


1A6 


-.It* 


( 


r 


1-' ' 


f-  ■ 


c)  Records  A and  B are  contained  in  a set  wilh  A as  owner 
and  D as  member. 


3.5.2  Ihc  INORABOVE  assertion. 

Items  can  be  mentioned  in  a query  both  as  intended  for  display  or  for  use  in  a 
conditional  test.  In  either  case  they  are  always  mentioned  within  the  context  of 
a particular  record.  The  principle  of  dov/nward  attribute  migration  lets  the 
system  conclude  that  the  items  must  be  physically  located  within  or 
hierarchically  above  the  context  record. 

For  those  unfamiliar  with  the  concept  of  confluent  hierarchies,  an  example  of 
such  a structure  is  given  on  the  left  in  Figure  3-4.  The  TREATMENT  record  Is 
the  base  record  of  this  structure  because  it  Is  hierarchically  below  all  other 
records.  This  structure  is  also  discussed  in  section  1.3,  Figure  1-3.  Since  any 
of  the  records  included  in  a confluent  hierarchy  can  occur  hierarchically  above 

any  of  the  other  records,  such  a structure  can  be  processed  "both  ways".  The 

\ 

query  context  determines  which  way  the  confluent  hierarchy  is  being  processed. 
It  is  for  this  reason  that  Ihe  INORABOVE  assertion  includes  a description  of  the 
query  context,  described  with  records  A and  B. 


Figure  3-4  illustrates  the  effect  of  query  context.  It  is  perfectly  legal  for  a 
query  to  refer  to  the  unique  DIAGNOSIS  associated  witir  a particular  PATIENT 
record  if  the  query  context  is  the  DOCTOR-PATIENT  hierarchy,  even  though  the 
actual  data  base  structure  has  the  DIAGNOSIS  item  below  the  PATIENT  record. 


!l  UNDERSTANDING  DATA  STRUCTURES 

j 3.5  Gencrflting  Assertions. 


This  is  legal  since  the  relationship  can  be  reversed  because  of  the  confluency  of 
the  DOCTOR  and  PATfENT  records.  Because  of  this  confluent  hierarchy  it  is 
possible  to  associate  many  DIAGNOSIS  records  with  a PATIENT  in  one  context 
and  to  associate  a unique  DIAGNOSIS  in  another  context. 

Since  it  is  not  known  at  the  time  the  assertions  are  made  if  a particular  context 
hierarchy  will  be  part  of  a confluent  hierarchy,  the  system  must  indicate  the 
hierarchical  context,  not  just  the  context  record,  when  describing  the  use  of  a 
particular  item.  Thus,  for  the  previous  example,  the  assertion  would  be: 
INORAOOVC(()IAGWSIS,PATIENT,DOCTOR).  To  the  second  phase,  this  assertion 
will  mean  "DIAGNOSIS  must  be  contained  within  or  above  the  PATIENT  record 
unless  the  I’AIIENT  and  DOCTOR  records  exist  in  a confluent  hierarchy  in  which 
case  the  DIAGNOSIS  item  must  be  contained  wilhin  or  abov?  the  lowest  or  base 
record  of  the  confluent  hierarchy." 


I 


UNDERSTANOING  DATA  STRUCTURES  1^8 

3.5  Genorating  Assertions. 


DATA  BASE  STRUCTURE  QUERY  CONTEXT  STRUCTURE 


Figure  3-A.  Transformation  of  a confluent  hierarchy  within 
the  query  context. 

3.5.3  The  C^ALCPORT  assertion. 

The  designer  does  not  use  the  CALCPORT  assertion.  It  is  generated  to  aid  the  ' 

DBA  in  determining  which  items  should  be  used  as  calculated  .keys  or  in  file 
inversions.  CALCPORT(I)  states  that  item  I has  been  used  in  a test  of  the  form  ■ 

I 

(I  EQ...),  and  that  possible  uce  of  I as  a calculated  key  is  suggested. 

I 

I 

j 

3.5.4  Remark  ^ attribute  miRration, 

I 

j 

In  the  first  two  chapters  the  concept  of  attribute  migration  was  introduced.  j 


UNDERSTANDING  DATA  STRUCTURES 
3.5  Gcncraling  Assertions. 

Since  Ihc  rJala  base  slructure  designer  Ignores  all  slalislical  commands  II  need 
be  concerned  only  with  downward  attribute  migration.  If  attribute  (item)  I 
occurs  In  the  context  of  record  A then,  because  of  downward  attribute 
migration,  the  system  can  conclude  that  item  I can  only  be  physically  stored  with 
record  A or  any  record  which  is  hierarchically  above  record  A.  Thus  from 
every  direct  item-record  association  the  system  derives  a lower  bound  on  the 
physical  location  of  that  item  within  the  data  base  structure.  If  the  system 
always  physically  locates  an  item  at  the  highest  tower  bound  encountered  so  far, 
then  an  item’s  physical  location  will  tend  to  migrate  upward  in  the  structure  as 
the  slruclurf!  becomes  more  constrained  curing  the  generation  of  assertions. 
Downward  attribute  migration  results  in  upward  physical  location  migration 


during  the  design  process. 


UNDERSTANDING  DATA  STRUCTURES 


150 


3.6  Desip.ninR  Record  Rclaticnships. 

The  process  of  designing  record  relationships  is  illustrated  in  Figure  3-5.  Two 
types  of  logical  relationships  are  recognized:  direct  hierarchies  and  confluencies. 
Since  a confiucncy  can  be  expressed  with  two  or  more  direct  hierarchies,  the 
final  design  will  specify  only  direct  hierarchies.  Each  such  hierarchy  Is  defined 
with  a UIERARCHYGROUP  assertion.  Each  box  In  the  flow  Is  actually  a loop 
which  attempts  to  erase  ABOVE  assertions  and  continues  to  do  so  until  an 
iteration  occurs  in  which  no  a«-seriiot*. !?  erased. 


T| 

UNDERSTANDING  DATA  STRUCTURES 
3.6  Designing  Record  Relationships. 


Figure  3-5.  Construction  of  record  relationships. 


The  principal  input  to  this  phase  of  the  design  process  is  the  ABOVE  assertions 
generated  by  the  Request  Handler,  When  the  record  relationship  design 
process  has  been  completed,  there  are  no  remaining  ABOVE  assertions.  A brir,f 
discussion  of  each  step  in  Figure  3-5  follows. 


3.6. 1 Detect  confine ncics. 

If  ABOVE(A,B)  and  ABOVEfB.A)  have  been  asserted  then  records  A and  B exist  in 


UNDERSTANDING  DATA  STRUCTURES 
3,6  Designing  Record  Relationships. 


152 


a conMuency.  The  system  therefore  as-.erts  CONFLUENCY(A.B)  and  erases  both 
of  the  ABOVE  assertions, 

3.6.2  Eliminate  redundant  ABOVE. 

If  AB0VE(A,B),  AB0VE(B,C),  and  ABOVE(A,C)  have  been  asserted,  then  AB0VE(A,C) 
Is  redundant  (and  can  be  erased)  since  it  Is  derivable  from  the  remaining  two 
assertions. 

3.6.3  Construct  confluent  hierarchies, 

If  CONFLUENCY(A.B),  ABOVE(A.C)  and  ABOVE(B,C)  have  all  been  asserted,  then 
the  confluency  has  been  captured  in  the  confluent  hierarchy  defined  by  the  two 
ABOVE  assertions  and  the  CONFLUENCY  assertion  Is  simply  erased. 

If  CONFLUENCY(A.B)  has  beer  aiserted,  but  no  record  C exists  as  above,  then 
the  system  constructs  a new  (artificia’)  record,  D,  and  asserts  ABOVE(A,D)  and 
ABOVE(B,D)'  The  CONFLUENCY  assertion  is  erased  as  above. 

In  both  cases,  the  system  will  alter  all  INORABOVE(I,A,B)  assertions  to 
INORABOVE(l,C,-)  or  IWRABOVE{l,D,*)  respectively.  This  step  recognizes  that 
all  items  referenced  within  the  context  of  the  (A,B)  confluency  must  be 
contained  in  or  above  the  base  record  (C  or  D respectively)  of  the  confluent 

hierarchy. 


4^ 


UNDERSTANDING  DATA  STRUCTURES 
3.6  D(?'icning  Record  Relationships. 

The  reason  for  processing  confluencies  In  Iwo  steps,  separated  by  the  step  to 
erase  redundant  assertions,  Is  best  illustrated  by  examples. 

Consider  tlie  following  sel  of  assertions: 


’ I 

I I 

I 


t 

[ 


ADOVEfA.B)  AnOVE<B,A) 

ADOVE(A,C)  ABOVE(B,C) 

Either  of  Ihe  latter  two  assertions  is  redundant  (not  both).  Assume  ABOVE(B,C) 
is  erased  before  the  detection  of  confluencies.  Then  the  following  assertions 
remain: 

ABOVE(A.D)  ABOVE(B,A) 

ABOVE(A,C) 

Now  the  system  can  not  discover  that  C is  the  base  record  of  the  confluent 
hierarchy. 

Although  postponing  the  erasure  of  redundant  assertions  *o  follow  the 
generation  of  confluent  hierarchies  will  not  affect  the  design  of  record 
relationships,  it  ^ have  an  adverse  effect  on  the  later  design  of  record 
contents. 


Again,  consider  an  example. 


If 


I 

I 

1 

I 

\ 

i 


I 


AHOVE(A.B)  ABOVE<B,A) 
ABOVE! A,C)  ABOVE(B,C) 
AO(..VE(A,D)  ABOVE(B,D) 


UNDERSTANDING  DATA  STRUCTURES 
3.6  DeciBning  Record  Rolationihips. 


AnOVE(C.D) 


1N0RAB0VE(1,A,0) 


have  all  been  asserted,  then  postponing  the  erasure  of  ABOVE(A.D)  and 


AB0VE(D,D)  may  result  in  the  detection  of  record  D as  the  base  record  of  the 


confluency.  This  in  turn  will  result  in  the  incorrect  assertion  INORABOVEd.D,-). 


3.6.4  Detect  rings. 


If  ABOVc(A.D),  AI30VE(0,C)  and  ABOVE(C,A)  have  all  been  asserted,  then  the 


system  asserts  R1NG(A,B).  This  assertion  has  no  further  impact  on  the  design 


process,  but  it  tells  the  DBA  that  the  data  structure  contains  a circular 


sub-structure.  Such  a sub-structure,  although  legal,  is  highly  unlikely,  and  Its 


occurence  will  usually  signal  the  presence  of  an  error. 


3.6.5  Generate  direct  hierarchies. 


This  Is  the  last  step  of  Figure  3-5,  and  it  transforms  all  remaining  ABOVE 


assertions  to  HIERARCHYGROUP  assertions.  Each  HIERARCHYGROUP  assertion 


defines  a DI3TG  "set"  and  therefore  requires  a unique  name.  The  system 


generates  such  names  in  the  form  SYMnn  (00<nn<100). 


UNDERSTANDING  DATA  STRUCTURES 


155 


w 


3.7  A Frame  for  Record  Relationship  Design. 

The  structure  designer  is  defined  to  the  APG  with  a Frame  of  rules  (Figure  3-6). 
The  Frame  is  compiled  to  Micro-Planner  theorems  by  the  APG  system.  (See 
section  2.2.)  These  Micro-Planner  theorems  form  the  basis  of-  the  data  base 
structure  designer’s  decision  subsystem.  Although  the  APG  compiler  is  used, 
the  resulting  program  is  not  a procedure  generator,  but  rather  a declarative 
generator.  Use  of  the  APG  compiler  to  compile  programs  which  are  not 
procedure  generators  requires  the  implementation  of  a new  rule  of  inference. 


This  new  rule,  called  an  assertion  rule  (denoted  by  S5),  permits  assertions  to  be 
made  from  pre-conditions  without  procedure  generation.  An  S5  rule  is  not 
equivalent  to  a logical  implication  since  the  post-condition  is  asserted  to  be  true 
and  may  contradict  the  pre-condition.  When  such  a contradiction  exists,  the 
rule  erases  those  assertions  that  are  true  in  the  pre-condition  and  false  in  the 
post-condition.  These  erasures  help'to  keep  the  set  of  assertions  as  concise  as 
possible. 


Type  S3  roles  are  also  used  to  define  the  design  process.  This  type  of  rule  is 
described  in  detail  in  Chapter  Two.  Type  S3  rules  are  the  equivalent  of 
implication  in  logic. 


Either  type  of  rule  is  defined  with  a pre-  and  post -condition  such  tl>dt  the  truth 
of  the  pre-condition  implies  the  truth  of  the  post -condition.  In  Figure  3-6,  the 


Ir:-"  ’ 


-If 


1 m 


r 


I 


UNDERSTANDING  DATA  STRUCTURES  156 

3.7  A Frame  for  Record  Relafionship  Design. 


post-condition  is  differentiated  from 


EUkE  NAME 

TYPf 

DETECT 

S5 

CONFLUENCIES 

ELIMINATE 

S5 

REDUNDANT 

ABOVE 

the  pre-condit‘On  with  a right  arrow  (-»). 
PRE  AND  POST  CONDITIONS 
ABOVE(A,B)aABOVC(B,A) 

-►  -«ABOVE(A.B)a-ABOVE(B,A)a 

CONFLU£NCY(A,B) 

ABOVE(A,B)a1NDABOVE(A,B) 

-♦  -ABOVEfA.B) 


CONSTRUCT  S5 

CONFLUENT 

uiERARCHlES 


DETECT  S5 

RINGS 


GENERATE  S5 

DIRECT 

HIERARCHIES 


COMMON  sn 

BOTTOM 


CONFLUENCY(A,B)A 

[COMBOT{A,B,C)v-(C,{GENSYM))]a 

SWITCHINORABOVE(A,B,C)a 

SWITCHINORABOVE(BAC) 

-*  -CONFLUENCY(A,B)A 
ABOVE(A,C)aABOVE(B,C) 


ABVUPLINK{A,B)aABVUPLINK{B,A) 
-♦  R!UG(A,B) 


ABOVEfA.B) 

-♦  -ABOVE(A,B)A 

HIERARCHYGROUP(A,B,(GENSYM)) 


[ABVU^LINK{C.A)aABVUPLINK(P,D)]v 

[CONrLUENCY(C,D)AABVUPLINK(D,B)A 

[-(C,A)vABVUFLINK(C,A)]] 

-♦  COMBOT(A,B,C) 


Figure  3-6  is  continued  on  the  next  page. 


UNDERSTANDING  DATA  STRUCTURES 

3.7  A Frame  for  Record  Relationship  Design. 


157 


RUE  NA^l 

TYPE 

PRE  AND  POST  CONDITIONS 

LINK  UP 

S3 

ABOVE(B,A)vINDABOVE(B,A) 

USING 

ABOVE 

ABVUPLINK(A,B) 

INDIRECTLY 

RECURSIVE 

ABOVE{A,C)a-«(B.C)a 

ABOVE 

S3 

[ABOVE(C,B)vINOABOVE(C,B)] 

-♦  INDABOVE(A,B) 


CONFLUENCY  S3  CONFLUENCY(B.A) 

EQUIVALEfCE 

-♦  CONFLUENCY(A.B) 


RECURSIVE  -INORABOVE(l,A,B)v 
S3  [INORABOVEI  I,A,B'aEXCHANGE(  I,A,B,C)a 

SWITCHIN0RAB0VE{A,B,O] 

-»  SWITCHINORABOVE(A,B,C) 


EXCHANGE  AN  S5  1N0RAB0VE(I,A,B) 

ITEM  IN 

CONFI UENT  ->  EXCHANGE(I,A,B,C)a 

HIERARCHY  !NCjRABOVE(I,A,B)a 

INORABOVE(I,C,A) 


Figure  3-6.  Rules  for  generating  record  relationships. 


3.7.1  Assertions. 

The  pre-  and  post-conditions  are  defined  in  terms  of  assertions  and  the  usual 
logical  notation,  A,  v,  and  Determination  of  the  truth  of  the  post-condition  is 
accomplished  by  evaluating  some,  if  not  all,  of  the  assertions  in  the 
pre-condition.  A particular  assertion  is  evaluated  to  true  (false)  if  it  Is  true 


EXCHANGE  All 
IlFMS  IN 
CONFI  UEF'T 
HIERARCHY 


N 


UNDERSTANDING  DATA  STRUCTURES 
3.7  A Frame  for  Record  Relationship  Design. 


158 


(false)  in  the  present  state,  or  by  the  evaluation  Of  the  pre  -condition  of  a rule 
which  has  the  particular  assertion  as  a post -condition. 

Thw  not  sign  (»)  oreceeding  an  ossertion  in  a post -condition  causes  that 
assertion  to  be  removed  from  the  currert  stfite, 

(GENSYM)  causes  a ne'v  symbol  tr  be  gerierated  In  tte  form  of  SYMofi.  Thus, 
"■(C,(GENSYM))"  caus'js  the  variable  C to  be  bound  to  a new  symbol. 

A semantic  interpretation  of  the  assertions  is  given  in  Figures  3-3  and  3-7. 


UNDERSTANDING  DATA  STRUCTURES 

3.7  A Frame  for  Record  Relaficnship  Design. 


159 


ASSERTION 

MEANING 

CONFLULNCY(A.B) 

Records  A and  B form  a confluency. 

C0M[301(A,t3,C) 

Records  A and  B are  both  directly 
or  indirectly  above  record  C. 

SWlTCIIIfJORABOVE(A,D,C) 

All  INORABOVE(l,A,B)  have  been 
changed  to  1N0RAB0VE(1,C,A). 

INDAOOVE(A.B) 

Record  A is  indirectly  but  not 
directly  above  record  B (uses 
transitivity  of  ABOVE). 

U1F(?ARCIIYGR0UP(A,B.S) 

Record  A is  the  owner  and  -ecord  B 
is  the  member  of  set  S. 

ABVUF^L1NK(A,B) 

Record  B is  directly  or  indirectly 
above  record  A. 

EXCUANGE(I,A,B,C) 

INORABOVE(t,A,B)  bas  been  converted 
to  1NORABOVE(1,C,A). 

DDKEY(A.I) 

1 is  the  data  base  key  for  record  A. 

CONTAINS(A.I) 

Record  A physically  contains  item  1. 

UPLINK(A.U) 

Record  B is  physically  above  record 
A.  (Uses  HICRARCHYGROUP  rather 
than  ABOVE.) 

COMTOI’(A.n.C) 

Record  C is  physically  above  both 
records  A and  B. 

RING(A,B) 

Structure  contains  at  least  one  Ring 
aid  it  contains  records  A and  B. 

“(A.B) 

A has  been  set  equal  to  B. 

Figure  3-7.  Assertions  and  their 

interpretations. 

I 


! 


( 

i 


3.7.2  Using  the  rules. 

The  first  five  rules  in  Figure  3-6  correspond  to  the  five  steps  in  Figure  3-5 
described  earlier.  The  remaining  rules  are  used  to  determine  the  values  of 
some  of  the  assert'ons  used  in  the  pre-conditions  of  the  f.ve  principal  rules. 

The  fiv._  principal  rules  are  each  invoKed  repetitively  by  a control  processor  to 


UNDERSTANDING  DATA  STRUCTURES 

3.7  A Frame  for  Record  Relationship  Design. 


160 


accomplish  the  processing  defined  in  Figure  3-5.  For  example,  DETECT 
CONFLUENCIES  is  invoked  with  the  goal  -AD0VE(A,B).  If  this  goal  is  successful 
(e.g.,  a conduency  is  detected),  the  rule  is  re-invoked  with  the  same  goal  until 
the  goal  is  nol  achieved. 

The  APG  system  was  extended  to  include  the  type  S5  axiom  in  order  to  j# 

accomodate  this  type  of  processing.  Manipulation  of  the  assertions,  Including 
erasure  thereof,  is  necessary  to  insure  termination  of  processing.  Without  such 
modification,  a rule  which  Is  true  once  remains  true  forever. 

3.7.3  ^ example. 

Figure  3-8  illustrates  the  change  in  the  assertions  after  applications  of  the  first 
three  rules. 


^ DFJORE  AFTER 

AUOVE(DOCTOR,PATIENT)  ADOVE(PATIENT,TREATMENT) 

ADOVE(PATIENT,DOCTOR)  ABOVEfDOCTOR, TREATMENT) 

ABOVEfDOC  rOR,TREATMENT)  AOOVEfDOCTOR.SYMOl) 

AHOVE(PATIENT,TREATMENT)  AB0VE(H0SPITAL,SYM01) 

ABOVEfDOCTOR, HOSPITAL) 

INORA0OVE(DIAGNOSIS,TREATMENT,DOCTOR) 
ABOVE(HOSPITAL,DOCTOR)  INORABOVE(OFFICENO,SYMO  I, HOSPITAL) 

INORABOVEfDIAGNOSIS, DOCTOR, PATIENT) 

IlviORABOVE(OFFICENO,HOSPITAL, DOCTOR) 


Figure  3-8.  Assertions  before  and  after  app’’cdtlon  of  the 
first  three  rules. 


UNDERSTANDING  DATA  STRUCTURES 

3.7  A Frame  for  Record  Relationship  Design. 

This  example  contains  two  conMuencies: 

(X)CTOI?  ■ PATIENT 

U05PIIAL  •*  DOCTOR 

which  are  delecled  by  the  first  rule.  This  results  In  the  erasure  of  four  ADOVE 
assertions  and  the  assertion  of  two  CONFLUEN>>Y 

There  are  no  redundant  ABOVE  assertions,  so  none  are  erased  using  the  second 
rule. 

The  third  rule  constructs  confluent  hierarchies.  It  uses  the  COMMON  BOTTOM 
rule  to  prove: 

COMf.'iOl  (DOCTOR, PAT  lENT, TREAT  MENT ) 
and  the  last  two  rules  of  Figure  3-6  to  change 
lNORABOVE(DIAGNOStS.DOCTOR, PATIENT)  to 

lNORABOVE(DIAGNOStS, TREATMENT.-).  The  second  confluoncy  Is  similarly 

processed,  except  that  a common  bottom  for  HOSPITAL  and  DOCTOR  is  not  found 
so  one  is  created  (SYMOD- 

3. 7. A Rcmarh  °'l  tiala  structures. 

Figure  3 9 shows  three  interesting  conflicting  structures  and  their  conversion  to 
cohfluenf  hierarchies.  The  X’s  stand  for  records  generated  by  the  system. 
Single  arrows  indicate  ABOVE  relationships;  A-*B  is  equivalent  to  ADOVE(A.B). 
Double  arrows  indicate  confluenci  ts;  A*^B  is  equivalent  to  CONFLUENCY(A,B). 


UNDERSTANDING  DATA  STRUCTURES 

3.7  A Frame  for  Record  Relationship  Design. 


162 


L 


s 


Note  that  due  to  the  Confluency  Equivalence  rule  this  is  also  equivalent  tu 
CONFLUENCY(D,A). 

The  first  case  is  interesting  because  any  record  is  below  the  other  two  yet  if 
this  record  is  used  as  the  base  for  a confluent  hierarchy  then  only  two 
confluent  hierarchies  will  result  in  the  final  structure,  as  is  illustrated  in  the 
incorrect  sequence.  The  Common  Bottom  rule  (with  one  exception)  excludes 
confluencies  to  find  a COMQOT. 


1 

1 


Figure  3-9.  Some  interesting  structure  transformations. 


UNDERSTANDING  DATA  STRUCTURES 
3.7  A Frame  for  Record  Relationship  Design. 


163 


The  exception  to  the  Common  Bottom  rule  is  illustrated  in  the  second  case  in 
Figure  3-9.  The  second  case  differs  from  the  first  only  in  that  one  of  the 
confluenci.^s  has  beeii  removed.  Since  the  A«*C  confluency  will  eventually  result 
’In  the  detection  or  creation  of  a record,  say  D,  which  is  below  record  C,  D will 
also  be  below  record  B.  In  general,  a "common  bottom"  can  bo  either  a record 
or  another  confluency,  but  the  links  used  to  establish  it  must  all  bo  ABOVE 
assertions.  Note  that  if  the  A«*C  confluency  were  processed  first,  the  same  fina 
structure  would  result,  but  the  XI  record  would  be  generated  in  the  first  rather 
than  the  last  step. 

The  third  case  is  interesting  because,  although  valid,  it  is  probably  the  result  of 
error  or  omission  in  the  set  of  queries.  Most  likely  one  of  the  hierarchies 
either  was  erroneously  referencod^or  is  actually  a confluency  but  was  never 
referencad  in  the  other  direction.  The  system  does  not  alter  this  structure,  but 
the  final  data  b*»se  structure  design  will  contain  assertion  of  the  form  RING(A,B), 
warning  the  DBA  that  at  least  one  ring  structure  exists  and  that  it  contains, 
among  others,  records  A and  B.  The  rule  for  detecting  a RING  structure  is 
given  In  Figure  3-6. 


I 


UNDERSTANDING  DATA  STRUCTURES 


16A 


3.8  Degir.ninp.  Rpcord  Contents. 

Processing  tlio  INORABOVE  assertions  to  provide  CONTAINS  assertions  for  the 
data  hasp  striicttire  is  similjtr  to  the  processing  of  the  ABOVE  assertions  to 
produce  tlie  IIU'RARCHYGROUP  assertions.  Figure  3-10  gives  the  generat  flow, 
and  Figure  3-11  t »ntains  the  rules.  The  first  three  rules  correspond  to  the 
steps  in  Figiire  3-10.  Each  box  in  tt-e  flow  is  actually  a loop  which  attempts  to 
erase  INC)WAHOVE  assertions  and  continues  to  do  so  until  an  iteration  occurs  in 
which  no  assertion  is  erased. 


The  process  illustrnled  in  Figure  3-10  ignores  the  third  argument  in  the 
INORABOVE  assertions,  since  all  processing  associated  with  confluent ies  was 
Clone  when  confluencies  were  transformed  to  confluent  hierarchies. 


r 

L 


► 


UNDERSTANDING  DATA  STRUCTURES 
3.8  Designing  Record  Conlenls. 


165 


Figure  3-10.  Oeterminalion  of  record  content  - general  (low. 


UNDERSTANDING  DATA  STRUCTURES 
3.8  Designing  Record  Conicnts. 


166 


RULE  NAME 

ELIMINATE- 

REDUNDANT 

ASSERflONS 


PRE  A!^  Ppil  CONDITIONS 

INORABOVE(I,A,C)aINORA0OVE(I,B,D)a 

[[■’-(A,B)aUPLINK(A,B)]v 

(-(A,B)a-.-{C,D)]] 

-INORABOVEd.A.C) 


FIND 

COMMON 

OWNERS 


INORABOVEI  I,A,C  )aINORABOVE(  I,B,D)a 
■’-(A,B)aCOMTOP(A,B,E) 

-♦  -INORABOVE(I,A,C)aINORABOVE(1,B,D)a 
INORABOVE(I,E,A)  ' 


CREATE 

CONTAINS 

ASSERTIONS 


INORABOVE(I,A,B) 

-♦  ■’INORABOVE(IAB)aCONTAINS(A,1) 


LINK  UP 


RECURSIVE  HIERARCUYGROUP(B,A,S)v 
S3  [HIERARCHYGROUP(C,A,S)aUPL1i  JK(C,B)] 

-»  UPLINK(A.B) 


COMMON 
TOf’  1 


UPLINK(A,DaUPLINK(B,C) 
-»  COMTOP(A,B,C) 


COMMON  S5 

TOP  7.  (IF  TOP  1 FAILS) 


-(C/SENSYM)) 

-»  COMTOP(A,B,C)a 

HIERARCHYGROUP{C,A,(GENSYM))a 

HiERARCHYGROUP(C,B,(GENSYM)) 


Figure  3-1 1.  Rules  for  tc'MiIishing  record  conlenis. 


UNDERSTANOING  DATA  STRUCTURES 
3.8  DeoigninB  Record  Contents. 


167 


3.8.1  EIjmin.ite  redundant  assertions. 

The  system  first  eliminates  redundant  INORABOVE  assertions.  INORADOVEd.A,-) 
is  redundant  if  item  I is  also  specified  to  be  in  or  above  another  record  3,  and  B 
IS  physically  above  A (using  transitivity).  This  makes  INORABOVEd.A.C) 
redundant  because  INORABOVEd,B,-)  implies  that  I Is  above  A. 

3.8.2  Find  common  owners. 

If,  following  the  elimin.ition  of  r.rdundancies,  more  than  one  INORABOVE  assertion 
remains  for  item  I,  then  some  record,  say  X,  must  be  found  which  connects  the 
separate  substructures  in  which  I is  referenced.  CO:v<TOP  finds  such  a recerd, 
or  generates  one  if  if  does  not  exist.  In  either  case  all  INORABOVE  asserfiors 
referencing  I will  be  replaced  by  INORABOVE(l,X,-). 


When  record  generation  occurs,  there  will  be  a COMTOP  assertion  In  the  final 
data  base  structure  to  indicate  to  the  DBA  the  reason  for  the  existence  of  the 
generated  lecord.  A COMTOP  assertion,  as  does  a RING  assertion,  probably 

signals  an  erroneous  use  or  omission  in  the  spt  of  queries  used  by  the  structure 
designer, 

3.8.3  Cre~ite  CONTAINS  assertions. 

After  all  appropriate  Common  Top  processing  has  been  dotie  there  will  be 
jxactly  one  INORABOVE  d,A,-)  for  each  item  1,  indicating  that  the  Item  should  be 
contained  iif  Record  A.  All  remaining  assertions  are  therefore  directly 


4 


i 


! i 

I 


I 


I 

I 

I 


I 


168 


UNDERSTANDING  DATA  STRUCTURES 
3.8  Desicning  Record  Contents. 

I 

transformed  by  the  third  rule  to  CONTAINS(A,l). 


UNDERSTANDING  DATA  STRXTURES  169 

3.9  Exampln. 

Figures  3-12  thro  jgli  3-19  give  a detailed  exarnple  of  the  design  process.  The 
user’s  queries  are  given  In  Figures  3-12  to  3-17.  This  Is  not  a complete  set  of 
queries  for  an  anticipated  implementation  so  the  resulting  data  base  structure 
will  probably  not  be  complete.  In  each  query,  the  system  prompts  are 
underlined  to  distinguish  them  from  user  replies.  For  expository  purposes, 
these  quo,  ies  are  somev/nat  contrived. 

Figure  3-18  contains  the  assertions  ge  lerated  In  the  first  phase.  Figure  3-18 
also  indicates  which  addilionat  assertions  resulted  from  each  query  when  the 
queries  were  processed  in  numerical  order.  Had  the  queries  been  processed  in 
a different  order,  the  resulting  set  of  assertions  would  have  been  the  same  but 
the  subset  of  asserlions  added  by  a particular  query  might  have  been  different. 


Fig  jre  3-19  contains  the  data  base  structure  ttcsign  as  output  by  the  system. 
The  COMTOP  assertion  for  system-created  record  SYM12  (containing  DOCNAME) 
will  lead  the  DOA  to  discover  an  error  in  the  first  luery  where  DOCNAME  is 
indicated  within  the  context  of  the  PATIENT  record  and  not  in  the  context  of  a 
possible  confluent  hierarchy.  (PATIENT  is  the  top  level  record  of  the  first 
query.) 

A representation  of  the  resulting  data  base  structure  using  data  structure 
diagrams  is  given  in  Figure  3-20.  The  six  queries  used  were  initially  created  to 


I 


UNDERSTANDING  DATA  STRUCTURES 
3.9  Example. 


170 


debug  Ihe  automatic  programmer  of  Chap.er  2 using  the  data  base  of  Figure 
2-28.  It  is  therefore  Intetesting  to  note  tt»e  close  simi'arity  between  the  human 
designed  data  base  of  Figure  2-28  and  tho  r'a'.ime  design  of  Figure  3-20. 
Primety  differences  are  the  lack  of  inversion  for  Ihe  PATIENT  record  on  RACE 
and  SEX  (although  the  system  suggests  SEX  as  a CALCPORT),  tlie  non-existence 
of  a confluent  hierarchy  for  DOCTOR  and  HOSPITAL  records,  and  Ihe  omission  of 
a HOSPITAL  - PATIENT  hierarchy. 

Since  Ihe  queries  consider  a HOSPITAL-DOCTOR  but  not  a DOCTOR-HOSPITAL 
hierarchy,  tlio  confluency  is  not  present  in  Ihe  designed  data  base  structure. 
Similarly,  none  of  the  queries  reference  a HOSPITAL-PATIENT  hierarchy,  so  this 
structure  is  also  not  included. 


I 


! 


1 


UNDERSTANDING  DATA  STRUCTURES 
3.9  Example. 


PRIMARY  RECORD  (MAIN) 

♦PATIENT 

CONDITIONS  FOR  RETRIEVAL 
♦(SEX  EO  ’M’) 

♦Nil 

ITEMS  OR  STATS  TQ  g£  DISPLAYED 

♦boCNAME 

♦REPEAT 

PRIMARY  RECORD  (REPEAT) 
♦TREATMENT 

coNfjrnoNs  for  retrieval 
♦(COUNT  GE  3) 

PRIMARY  RECORD  (COUNT I 
♦ORDERS 

CONDITIONS  FOR  RETRIEVAL 
♦NIL 

♦nIl 

FtEMS  or  STATS  TO  BE  DISPLAYED 

♦DIAGNOSIS 

♦NIL' 

♦NIL 


Figure  3-12.  Query  number  one:  Tor  all  male  patients  display  the 
associated  doctor’s  name  and  ail  diagnoses  for  which  at  least  three 
orders  have  been  prescribed," 


r 


X - 


UNDERSTANDING  DATA  STRUCTURES 
3,9  Example. 

PRIMARY  RECORD  (MAIN) 

♦DOCTOR 

CONDITIONS  FOR  RETRIEVAL 
♦(COUNT  GT  5) 

PRIM/^RY  RECORD  (COUNT) 
♦PATIENT 

rX'.fviDlT10NS  FOR  RETRIEVAL 
.♦(SEX  EQ  'M') 

♦NIL 
♦NIL  ■ 

[jEMS  OR  STATS  TO  BE  DISPLAYED 

iDOCNAME 

♦REPEAT 

PRIMARY  RECORD  (REPEAT) 
♦TREATMENT 

CONDITIONS  FOR  RETRIEVAL 
♦(SLX  EQ  ’M’) 

♦NIL 

ItCiMS  Qfl  STATS  IQ  BE  DISPLAYED 

♦DIAGNOSIS 

♦PATNO 

♦PATNAME 

♦NIL 

♦nFl 


Fieurc  3-13.  Query  number  two:  "For  all  doctors  having  more  than  live 
patients,  display  the  doctor’s  name  and  the  diagnosis  patient’s  name  and 
number  for  every  male  patient  being  treateo  by  that  doctor." 

\ 


UNDERSTANDING  DATA  STRUCTURES 
3,9  Example. 


173 


i 

I 


t 


PRIMARY  RECORD  (MAIN) 

♦PATIENT 

CONDITIONS  FOR  RETRIEVAL 
♦(COUNT  EQ  6) 

PRIMARY  RECORD  (COUNTl 
♦DOCTOR 

CONDITIONS  FOR  RETRIEVAL 
♦NIL 

♦(p'aTNO  LT  6000) 

^NIL 

Ttems  or  stats  to  be  displayed 

♦PATNAME 

♦PATNO 

♦REPEAT 

" PRIMARY  RECORD  (REPEAT! 
♦BILLENTRY 

CCNOmONS  EQR  RETRIEVAL 
♦(AMOUNT  GT  100) 

♦NIL 

FtF  MS  OR  stats  TO  BE  DISPLAYED 

♦AMOUNT 

♦CODE 

♦NIL 

♦UOSPNAME 

♦NIL 


Figure  3-1  A.  Query  number  three:  "For  all  patients  with  exactly  6 
doctors  and  who  have  a patient  number  less  than  6000,  display  the 
patient’s  name,  number,  hospital  and  the  amount  and  code  for  all  bills 
,,  over  1100." 

i i 


i 


> 


I 


I 

I 


1 


1 

I 

■4 

( 

\ 


UNDERoTANUING  DATA  STRUCTURES 
3.9  Examf.Ic. 


174  ! 1 


PRIMARY  RECORD  (MAIN) 

♦HOSPITAL 

CONPHLONS  FQR  RETRIEVAL 
♦ANY 

PRIMARY  RECORD  (ANY) 

♦DOCTOR 

CONDITIONS  RETRIEVAL 
♦(DOCNAME  EQ  ’SMITH*) 

♦ANY 

PRIMARY  RECORD  (ANY) 
♦TREATMENT 

CONDITIONS  FOR  h~»^TRIEVAL 
♦(DIAGNOSIS  EQ  ’TUBERCULOSIS’) 
♦NIL 

♦nFl 

♦nFl 

FtEMS  or  STATS  TO  BE  DISPLAYED 

♦HOSPNO 

♦NIL 


Figure  3-15.  Query  number  four:  "Display  the  hospital  number  for  all 
hospitals  that  have  a doctor  named  Smith  who  has  diagnosed 
tuberculosis." 


V 


UNDERSTANDING  DATA  STRUCTURES 
3.9  Example. 


PRIMARY  RECORD  (MAIN) 

♦DOCTOR 

CONDITIONS  FOR  RETRIEVAL 
♦ANY 

~ primary  RECORD  (ANY) 
♦TREATMENT 

CONDITIONS  FOR  RETRIEVAL 
♦(DIAGNOSIS  EQ  ’PNEUT^NIA’) 
♦NIL 

♦(AVE  GT  3) 

PRIMARY  RECORD  (AVE) 
♦TREATMENT 

CONDITIONS  FOR  RETRIEVAL 
♦ NIL 

nTK^  QR  STATS  FOR  AVE 
♦COUNT 

PRIMARY  RECQRD  (COUNT) 
♦ORDERS 

CONDITIONS  FOR  RETRIEVAL 
♦NIL 

♦nFl 

♦Nli. 

LLLMS  OR  STATS  TQ  BE  DISPLAYED 

♦DOCNAME 

♦COUNT 

PRIMARY  RECORD  (COUNT) 
♦ORDERS 

CONDITIONS  FOR  RETRIEVAL 
♦NIL 

♦nFl 


Figure  3-16.  Query  number  five:  "For  all  doctors  who  have  diagnosed 
pneumonia  or  \/hose  average  number  of  orders  ptr  trea  ment  is  less 
than  three,  display  the  doctor’s  name  and  a count  of  the  number  of 
orders  prescribed." 


/ 


UNDERSTANDING  DATA  STRUCTURES  ' 
3.9  Exampifj. 


176 


primary  record  (MAINI 
iPATIENT 

CONDITIONS  FOR  RETRIEVAL 
*(PATNO  EQ  10) 

♦NIL 

ILEM5.  OR  STATS  IQ  Pi  DISPLAYED 
♦MIN 

PRIMARY  RECORD  (MIN) 

♦ORDERS 

CONDITIONS  FOR  RETRIEVAL 
♦NIL 

LICMS  QR  STATS  FOR  AVE 
♦ORDNO 
♦NIL 
♦NIL 


Figure  3-17.  Query  number  six:  Tor  the  patient  with  the  number  10, 
display  the  smallest  order  number  prescribed  for  th..,  patient." 


I 


I 

i 


* 


UNDERSTANDING  DATA  STRUCTURES 
3.9  Example. 


r 


I 

ff-i 
V 


ASSERTION 
CALCPORT (PATNO) 

CALCPORT  (DIAGNOSIS) 

CALCPORT  (DOCNAME) 

CALCPORT  (SEX) 

ABOVE  (PATIENT  ORDERS) 

ABOVE  (DOCTOR  ORDERS) 

ABOVE  (HOSPITAL  DOCTOR) 

ABOVE  (PATIENT  DILLENTRY) 

ABOVE  (PATIENT  DOCTOR) 

ABOVE  (DOCTOR  TREATMENT) 

ABOVE  (DOCTOR  PATIENT) 

ABOVE  (TREATMENT  ORDERS) 

ABOVE  (PATIENT  TREATMENT) 

INORABOVE  (ORDNO  ORDERS  PATIENT) 
IiDRABOVE  (HOSPNO  HOSPITAL  PORI) 
/I'lOHADOVE  (DOCNAME  DOCTOR  HOSPITAL) 
INORABOVE  (HOSPNAME  PATIENT  PORT) 
INORABOVE  (CODE  DILLENTRY  PATIENT) 
INORABOVE  (AMOUNT  DILLENTRY  PATIENT) 
INORAEiOVE  (PA'iNAME  PATIENT  PORT) 
INORABOVE  (PATNO  PATIENT  PORT) 
INORABOVE  (PATNAME  TREATMENT  DOCTOR) 
INORABOVE  (PATiMO  TREATMENT  DOCTO.O 
INORABOVE  (DIAGNOSIS  TREATMENT  DOCTOR) 
INORABOVE  (SEX  TREATMENT  DOCTOR) 
INORABOVE  (DOCNAME  DOCTOR  PORT) 
INORABOVE  (SEX  PATIENT  DOCTOR) 

INORABOVE  (DIAGNOSIS  TREATMENT  PATIENT) 
INORABOVE  (DOCNAME  PATIENT  PORT) 
INORABOVE  (SEX  PATIENT  PORT) 


Figure  3-18.  Asserlions  generated  frr  m queries  1-6. 


UNDERSTANDING  DATA  STRUCTURES 
3.9  Example. 


COMTOP  (PATIENT  DOCTOR  SYM12)  - 
CALCPORT  (SEX) 

CALCPORT  (DOCNAME/ 

CALCPORT  (DIAGNOSIS) 

CALCPORT  (PATNO) 

HIERARCHYGROUP  (SYM12  PATIENT  SY'vIlS) 
HIERARCMYGROUP  (SYM12  DOCTOR  SYM13) 
HIERARCHYGROUP  (HOSPITAL  DOCTOR  SYMIO) 
HIERARCHYGROUP  (PATIENT  BILLENTRY  SYKHOS) 
HIERARCHYGROUP  (DXTOR  TREATMENT  SYM06) 
HIERARCHYGROUP  (TREATMENT  ORDERS  SYMOfl) 
HIERARCHYGROUP  (PATIENT  TREATMENT  SYM02) 
EXCHAf)GE  (PATNO  PATIENT  DOCTOR  TREATMENT) 
EX'-.tANGE  (SEX  PATIENT  fXX^TOR  TREATMENT) 
iJBKEY  (ORDERS  SYM23) 

DBKEY  (BLLENTRY  SYM22) 

• DBKEY  (TREATMENT  SYM21) 

DBKEY  (CXJ'..TOR  SYM20) 

DBKEY  (PATIENT  SYM19) 

DBKEY  (HOSPITAL  SYMfS) 

DBKEY  (SYM12  SYM17) 

CONTAINS  (ORDERS  ORONO) 

CONTAINS  (HOSPITAL  HOSPNO) 

CONTAINS  (PATIENT  HOSPNAME) 

CONTAINS  (BILLENTRY  CODE) 

CONTAINS  (BILLENTRY  AMOUNT) 

CONTAINS  (PATIENT  PATNAME) 

CONTAINS  (PATIENT  PATNO) 

CONTAINS  (TREATMENT  DIAGNOSIS) 

CONTAINS  (PATIENT  SEX) 

CONTAINS  (SYM12  DOCNAME) 


Figure  3-19.  Geneiated  data  base  structure. 


4 


UNOERSTANOING  DATA  STRUCTURES 
3.9  Example. 


HOSPITAL 


SYM12 


hospno 


docname 


SYM15, 


TREATMENT 


BILLENTRY 


code,  amount 


diagnosis 


ORDERS 


ordno 


Figure  3-20.  Data  Structure  diagram  for  Figure  3-19. 


PATIENT 
patname,  patno 

DOCTOR 

UNDERSTANDING  DATA  STRUCTURES 


180 


3.10  Alternative  |mplet>.antation. 

An  earlier  and  different  version  of  the  automatic  data  base  designer  was  also 
impierrented.  The  difference  between  the  two  versions  should  interest  these 
who  build  knowledge  based  systems. 

a 

To  build  the  DMLP  I had  to  capture  my  Knowledge  of  programming  in  a set  of 
rules  which  could  be  used  and  understood  by  the  APG  system.  The  first  oi  the 
two  data  base  designers  was  also  built  in  this  way.  The  'esult  was  a system 
which  at  least  partially  modelled  my  own  behavior  when  designing  data  base 
structures:  ovoiutionary.  That  Is,  the  structure  evolves  during  a period  of  time 
in  which  I become  more  familiar  with  user  requirements.  This  has  the 
advantage  that  at  ail  times  my  current  understanding  of  user  requiiements  Is 
captured  in  Ihe  current  data  structure. 

Implementing  this  strategy  in  an  automatic  designer  leads  to  some  oifficuities. 
These  diffKultics  arise  becaur  as  the  structure  evolves  it  Is  frequently 
necessary  to  make  assumptions  or  to  choose  from  several  equally  viable 
alternatives.  This  makes  the  process  rather  ad  hoc,  esoecially  later  In  the 
design  when  further  constraints  require  that  some  earlier  assumptions  and 
choices  bo  undone.  Programming  this  strategy  therefore  requires  a 
precognizance  of  ail  possible  occurences  of  the  situations  where  ad  hoc 
decisions  are  necessary. 


If 


J 


\ 


• I 

K 


UNDERSTANDING  DATA  STRUCTURES 
3.10  Ailernativ*  Implemenlallon. 


It  became  quickly  obvious  that  such  pre-planni^.g  Is  very  difflcull.  There  were 
frequent  additions  and  modifications  to  the  system  for  hand!  ng  conditions  that 
had  not  been  forseen.  Furthermore  It  was  impossible  to  prove  any  consistency 


in  system  outputs. 


The  second  implementation,  which  grew  out  of  the  frustrations  from  the  first 
differs  in  that  the  oosign  does  not  evi)ive.  The  second  system  puts  information 


extracted  from  queries  into  an  unstructured  pool  until  a command  Is  given  to 


generate  a des.gn.  One  Improvement  is  that  all  different  orderings  of  Inputs 
yield  identical  results.  Other  Improvements  are  that  outputs  are  easily  provable 
from  Inputs,  and  errors  and  inconsistencies  In  the  input  are  more  visible  In  the 


resulting  data  structure. 


Although  less  rigorous  than  the  second  implementation,  the  first  method  is 


probably  bettor  lo.  Iu.inan  design 'rs.  The  second  method  requires  storage  of 


an  unrelated  set  of  facts  (assertions),  something  which  computers  do  well  but 


people  don't. 


(C* 


j 


L 

1 


i 

r 


t 


i I 


' 


I 


I 


I 


UNDERSTANDING  DATA  STRIXTURES  182 

4 EXTENSIONS. 

This  chapfor  describes  further  work  thet  can  be  done  along  the  lines  set  out  In 

the  preceding  chapters.  I wIP  first  discuss  Imirtediats  sttenfions  to  the  axistlng 

systems  for  retrieval  program  ger>eration  and  data  structure  design.  There  are 

» ' 

also  two  sub-sections  that  describe  extension  of  the  techniques  Into  other  data 
base  programming  areas:  data  base  update  and  data  base  restructuring. 

4.1.  Extensions  to  the  DMLP. 

The  extensions  discussed  below  are  immediate  because  a conceptualization  of 
their  implementation  in  the  existing  systems  is  quite  cleai.  1 would  estimate  an 
average  Implementation  time  of  1 week  per  itemized  extension,  or  approximately 

3 man-months  for  rU  those  itemized  below 

4 1.01  Generation  of  the  Data  Division. 

The  program  generator  creates  only  a Procedure  Division.  Theie  are  several 
other  divisions  needed  to  make  a complete  COBOL  program.  The  most 
significant  of  these  is  the  Data  Division.  The  reason  for  non-generation  of  the 
Data  Division  Is  that  the  major  portion  is  generated  by  the  Sub-Schema 
processor  defined  by  the  DBTG.  Incorporation  of  the  major  portion  of  the  Data 
Division  can  therefore  be  accomplished  by  Including  an  appropriate  COPY 
statement.  A Data  Division  so  derived  must  be  extended  only  to  include 
temporary  working  storage  locations  that  are  used  by  the  Procedure  Division. 


a 

] 

I 


I 


< 


'I 


UNDERSTANDING  DATA  STRUCTURES  183 

A.l.  Extensions  to  the  Retrieval  Program  Generator.  ‘ 

The  generated  Data  Division  would  appear  as  follows; 

DATA  DIVISION. 

FILE  SECTION. 

FD  DUMMY  COPY  <sub-schema-library-file>. 

WORKING-STORAGE  SECTION. 

77  XI  PIC  <>  USAGE  <>. 

• 

77  Xn  PIC  <>  USAGE  <>. 

The  <sub-schoma-library-filri‘>  must  be  supplied  by  the  user.  The  XI  through 
Xn  are  the  working  storage  variables  used  by  the  Procedure  Division.  Picture  |j 

and  Usage  clause  parameters  will  be  the  equivalent  of  the  Picture  and  Usage 
clauses  for  the  source  data  name  if  the  variable  Is  used  to  calculate  a total, 
average,  minimum  or  maximum  of  the  item  I-'entified  by  the  source  data  name. 

Otherwise  the  variable  will  have  "PICTURE  9(10)  USAGE  COMP". 

A.1.02  Disiimctive  port  selection. 

In  the  present  system.  If  a calculaled  kev  is  used  In  a disjunction  such  that  It  is 
used  as  a port,  then  it  must  be  so  used  in  every  disjunct.  This  requirement  can 
be  relaxed  by  requiring  only  that  every  disjunct  must  contain  a port.  Note  that 
this  requirement  does  not  state  that  each  such  port  must  refer  to  the  s,mo 
calculated  key. 


1 


UNDERSTANDING  DATA  STRUCTURES 

A.l.  Extensions  to  the  Rctriev*!  Progr«m  Generator. 


13A 


A.  1.03  Recursive  processing. 

A data  base  that  contains  a structure  of  the  type  given  In  Figure  4-1  can 
contain  record  typos  that  are  recursively  linked  to  themselves.  For  example, 
the  Bill  of  Materials  parts  structure  for  a manufacturing  or  assembly  operation 
can  be  storea  in  this  typo  of  data  base  structure 


RECORD  A 


SE.  1 SE1 


a 


RECORD  B 


1 


Figure  4-1.  Recursive  network  structure. 


In  Its  present  state,  the  system  is  "almost"  capable  of  processing  such  a 
structure  to  a predetermined  number  of  levels  but  In  a very  Inelegant  way;  the 
user  must  specify  each  level  to  be  processed,  and  one  or  more  separate 
program  paragraphs  v iil  result  for  each  such  level. 


A much  more  elegant  generated  program  would  contain  recursively  entered 
paragraphs,  such  that  each  level  can  be  processed  by  the  same  program 
segment.  This  type  of  program  generation  would  require  special  recognition  of 


UNDERSTANDING  DATA  STRXTURES  185 

A.l.  Extensions  to  the  Retrie  al  Program  Cenerator. 

a recursive  structure,  incorporat  on  of  stack  logic,  and  inclusion  in  HI-IQ  of 

a 

speciai  commands  for  this  type  of  retrieval. 

I 

A.  1.0 A Allow  set  name  specification  by  the  user. 

The  system  sho"l„  allow  knowledgeable  users  to  specify  set  names,  e.g.  define 
portions  of  the  access  path.  This  would  ailow  resolution  of  possible  cmbiguities 
by  the  user,  and  user  override  of  the  program  generator  where  appropriate. 

I 

Thlf  Improvement  would  be  necessary  to  erase  the  "almost"  In  the  second  | 

paragraph  of  A.1.03,  above. 


A.  1.05  Arithmetic  and  |err,porarv  Items. 

This  would  allow  the  user  to  specify  the  calculation  of  values  *rom  da*a  base 
Items.  Temporary  items  would  be  implemented  to  store  the  results  of  arithmetic 
and  also  to  allow  calculations  or  tests  involving  several  occurantes  of  the  same 
Item. 

A.1.06  PistinRuish  ERRORSTATUS  values. 

[ I At  preseni,  the  system  examines  ERRORSTATUS  only  for  a non-zero  condition, 

and  then  assumes  that  an  expected  er’’0r  condition,  such  as  end  of  set,  has 
occurred.  Ry  examinitig  the  actual  value  of  ERRORSTATUS,  the  system  c-n 
determine  if  the  condition  is  as  expected  or  truly  represents  an  error  that 


should  be  reported  to  the  user. 


UNOERSTANOING  DATA  STRUCTURES 

A.l.  Exlensions  to  the  Retrieval  Program  Generator. 


186 


A preferred  implementation  would  requi'’e  modifications  to  the  Df9TG 
specification.  This  preferred  implementation  would  provide  for  an  error  filter 
resident  in  the  data  management  routines  but  controlled  by  the  run  unit.  This 
filter  could  be  set  to  ij.iore  certain  conditions,  trap  c i other  conditions,  and 
report  a third  set  of  conditions  directly  to  the  user. 

As  an  alternative,  such  an  error  filter  could  also  be  directly  incorporated  as  a 
standard  section  of  the  generated  Procedure  Division. 

A.1.07  Efficient  use  of  sorted  sets. 

DBTG  provides  for  sets  whose  members  are  ordered  on  key  values.  Sequential 
searches  that  are  dependent  on  the  key  value  can  be  terminated  when  the  key 
value  of  the  current  set  member  is  higher  than  the  key  value  range  for  the 
search  condition.  I 

\ 

A.  1.08  OrdorinR  of  coniuncts  and  disiuncts. 

When  compound  conjunction  is  to  be  tested,  the  resulting  program  will  be 
more  efficient  if  the  conjuncts  are  ordered  so  that  the  cheaper  and  most 
restrictive  tests  are  performed  hrst.  Determining  the  best  order  is  a complex 

I 

task  blr.ee  tl>e  cheapest  test  may  not  be  the  most  restrictive. 

Oisjuncts  should  be  ordered  with  thf»  cheapest  and  least  restrictive  test  first. 


UNDERSTANDING  DATA  STRUCTURES  186 

4.1.  Extensions  to  the  Retrieval  Program  Generator. 

A preferred  implementation  would  require  modifications  to  the  DBTG 
specification.  Thh  preferred  implementation  would  provide  for  an  error  filter 
resident  in  the  data  management  routines  but  controlled  by  the  run  unit.  This 
filter  could  be  set  to  ignore  certain  conditions,  trap  on  other  conditions,  and 
report  a third  set  of  conditions  directly  to  the  user. 


As  an  alternative,  such  an  error  filter  could  also  be  directly  incorporated  as  a 
standard  section  of  the  generated  Procedure  Division. 


4.1.07  Efficient  use  of  sorted  sets. 

DBTG  provirles  for  sets  whose  members  are  ordered  on  key  values.  Sequential 
searches  that  are  dependent  on  the  key  value  can  be  terminated  when  the  key 
value  cf  the  current  set  member  is  higher  than  the  key  value  range  for  the 
search  condition. 

\ 

4.1.08  Ordering  of  coniuncts  and  disiuncis. 

When  a compound  conjunction  is  to  be  tested,  the  resulting  program  will  be 
more  efficient  if  the  conjuncts  are  ordered  so  that  the  cheaper  and  most 
restrictive  tests  are  performed  first.  Determining  the  best  order  is  a complex 
'ask  since  tlie  cheapest  test  may  not  be  the  most  restrictive. 


Disjunct^  should  be  ordered  with  the  cheapest  and  least  restrictive  test  first. 


UNDERSTANDING  DATA  STRUCTURES 

4.1.  Extensions  to  the  Retrieval  Program  Generator. 


187 


Possible  measures  of  test  cost  are  the  number  of  lines  of  generated  code,  the 
number  of  performs  in  the  generated  code,  or  the  number  of  data  access 
statements  required  for  the  test. 

A determination  of  expected  restrictiveness  is  more  complicated  although  a 
certain  simple-minded  ordering  is  possible  with  equality  tests  rated  as  most 
restrictive,  then  weak  inequalities,  and  finally  strong  inequalities.  Further 
determination  of  restrictiveness  would  be  more  costly  and  would  require 
Knowledge  of  data  base  content;  See  the  next  paragraph. 

4. 1 .09  Ordorinp.  of  test  arguments.. 

Tests  are  always  of  the  form  (A  REL  B).  where  "A"  and  "B"  have  some 
determinable  value  and  REL  is  an  equality  or  one  of  the  inequality  relations.  If 
only  one  of  A or  B requires  a statistic  calculation  in  order  to  determine  its  value, 
then  it  is  more  efficient  to  determine  the  value  of  the  other  (non  statistical) 
argument  first.  This  is  so  because,  if  the  value  of  the  non  statistical  argument 
is  Known,  then  it  is  frequently  possible  to  determine  the  truth  value  of  the  test 
without  calculating  the  exact  value  of  the  statistic.  See  section  2.9. 

4.1.10  Stronger  system  tests. 

It  is  possible  to  strengthen  the  teM  used  to  terminate  the  calculation  of  a 
statistic  (as  discussed  above)  if  the  type  of  REL  is  taken  into  consideration.  See 


section  2.9. 


UNDERSTANDII'IG  DATA  STRUCTURES 

4.1.  Exlcir.iono  to  the-  Rptrieval  Program  Generator. 


188 


1 • 1 1 U.y-1  ^ ^?ta  characteristics. 

This  extension  is  discussed  in  detail  in  section  2.9. 

4. 1 . 1 2 Removal  ql  oquivaleni  paragraphs. 

As  can  be  seen  in  Figure  2-31,  where  paragraphs  105  and  301  are  oquivalen., 
the  system  may  generate  duplicate  paragraphs. 

A "quicky"  solution  would  be  to  search  the  resulting  program  for  duplicate 
paragraphs.  It  would  be  more  elegant  to  catalog  each  paragraph  as  a 
subroutine  as  it  is  generated.  Each  such  catalogued  paragraph  then  becomes  a 
theorem  that  can  be  used  for  subsequent  program  generation. 

4.1.13  Combination  ol  statistical  calculations 

If  several  different  statistics  are  to  be  calculated  over  the  same  domain  then  the 
generated  program  will  contain  separate  program  segments  for  the  calculation  of 
each  type  of  statistic.  That  is,  all  totals  over  the  same  domain  will  be  calculated 
within  the  scope  of  a single  PERFORM.  However,  if  a count  is  also  to  be 
calculated  over  the  same  domain  it  will  be  counted  in  a separate  section  of  the 
program  that  is  invoked  by  a different  perform. 

This  method  of  program  generation  leads  to  larger  programs  than  necessary  and 


can  have  an  extremely  negative  effect  on  run-time  efficiency. 


UNDERSTANDING  DATA  STRUCTURES 

A.l.  Extensions  to  the  Retrieval  Program  Generator. 


189 


A solution  would  require  the  detection  of  similar  calculation  domains  by  the 
Request  Handler.  This  can  be  done  by  checking  for  similar  LINKS  assertions 
prior  to  1 ..king  a new  LINKS  assertion.  It  would  be  necessary  to  change  the 
LINKS  assertions  so  that  multiple  actions  can  be  specified. 

A.  1.1/1  Intcrf  aces  ^ externally  coded  program  segments. 

As  it  stands  now  the  syslem  generates  a complete  Procedure  Division.  The  data 
retrieved  by  the  generated  program  is  simply  displayed.  In  an  actual  operating 
environment,  it  may  sometimes  be  desirable  to  process, the  retrieved  data  with  a 
report  writer,  perform  complex  analyses,  or  output  it  to  other  devices.  This 
suggests  using  the  program  generator  to  write  sections  of  procedure  that 
combined  with  other  sections  form  a complete  Procedure  Division. 

The  machine  generated  sections  would  serve  as  data  locators.  Rather  than 
displaying  the  data,  such  sections  would  simply  assure  its  presence  in  Working 


Storage. 


UNDERSTANOiNG  DATA  STRUCTURES 


190 


A. 2 Exler'rjion?  [o  the  P<'il_a  Structure  PcciRner. 

A. 2.1  G^ie_rfje  der.iRn. 

The  cler.igner  ignore?,  many  data  characteiutir?,  that  might  result  jn  the  design  of 
a more  efficient  data  base,  A data  base  is  more  efficient  if  it  requires  less 
storage  space,  or  if  the  total  time  required  to  process  information  therein  is 
reduced. 

here  are  also  design  issues,  unrelated  to  the  structural  design,  that  must  be 
considered  for  a complete  data  base  design.  Extending  the  designer  to  consider 
these  issues  and  to  make  use  of  data  characteristics  is  a proper  next  step. 


A. 2. 2 Practicality  q£  paraclip.m. 

The  practicality"  of  accumulating  a complete  set  of  queries  prior  to  attempting 
the  design  is  open  to  question  and  should  be  investigated. 

\ 

A. 2 3 Eorlher  cost  study. 

Empirical  study  of  ttie  costs  of  human  performance  on  equivalent  design  tasks  is 
needed  to  determine  if  automatic  data  base  design  is  less  costly. 

A. 2. A Pocurcivje  structures. 

There  is  a particular  substructure  in  network  data  bases  for  the  storage  of 
recursively  linked  records  (figure  A-1).  If  the  programmer  is  to  be  extended  as 
in  section  A.  1.03,  then  the  designer  should  be  extended  in  an  equivalent  fashion. 


UNDERSTANDING  DATA  STRUCTURES  191 

A.2.  Extensions  to  the  Data  Structure  Designer. 

4.2.5  Explore  alternative  designs. 

The  data  structure  design  arrived  at  by  the  system  may  be  one  of  many 
possible  designs  for  the  set  of  queries.  An  extension  to  allow  the  generation  of 
alternative  designs  (or  at  least  indicate  to  the  DBA  which  portions  of  the  design 
are  fixed  and  which  are  not)  is  suggested. 


understanding  data  structures 


192 


4.3  ^omahon  of  Updale. 

The  ,rad„.ona,  v,ew  of  upOa.o  „ con..,s  of  three  types  of  tasKs;  record 

»ddi.,o„,  recor.  „e,e,io.,  ,„ord  T„e  lal,..  ,„c  „.Kr, 

0,  „.e  record  in  oo,Ci„„  „h,C 

precede  ll,r  e.eculion  of  the  actual  update.  Not  much  modificat.on  of  ,ho 

P granimor  \^ould  be  needed  to  provide  it  with  an  update  capabi'ity  for 
deletion  and  modification. 


Since  records  in  a network  data  base  may  be  linked 


addition  of  records  rci. 


to  many  other  recorcis, 


«orcK  ecornres  H„l  ,1,  eclabNchod,  A 

record  ,dd,„on  ™r.,  idoro.oro  bo  procodod  b.  co.ora,  record  roirieoaic.  e.rcb 
retrieval  ,oc„i„e  „„e  o,  M,e  records  to  .bicb  ,„e  „e.  record  ic  bo  tinKod. 
Code  treneration  tor  „.ece  rotrievetc.  t,a,  alread,  been  demonc, rated. 

Determination  of  the  set  of  records  to  be  retrieved  and  the  criteria  for  their 
retrieval  is  still  an  unsolved  problem. 


This  problem  does  not  appear  to  be  too  difficult,  however.  The  ^t.  ^ 
is  delermined  b,  tindinj  ,tt  ttlEPARCHVGItOUPs  lhal  contain  the  record  to  be 
added,  Tbo  c_r^  ,„r  tbo  rotrievat  o,  tbic  sot  ot  records  cannot  bo 
determ, nod  w, it, out  some  knowledf.e  ol  the  lOCATiON  MODE  parameters  defined 

in  the  Settema,  The  system  woutd  theretore  have  to  be  extended  so  it, a,  i,  can 
understand"  these  parameters. 


UNDERSTANDING  DATA  STRUCTURES 
4.3  AutomHtion  of  Data  Base  Update. 

An  update  program  generator  should  also  be  influenced  in  its  programming  by 
the  batch  vs  on-line  dichotomy.  The  batch  processing  type  of  program  cannot 
communicate  with  the  user  in  realtime,  a restriction  that  does  not  apply  to  the 
on-line  program.  The  system  already  has  a limited  capability  for  interactive 
information  retrieval,  and  it  should  be  possible  to  handle  on-line  Iransaction 
processing  in  the  same  way. 

Batch  updating  requires  that  the  structure  of  both  the  transaction  and  the  data 
base  be  defined.  Although  I have  not  investigated  this  in  detail,  1 am  confident 
that  the  methods  used  for  retrieval  program  generation  are  applicable  without 


major  extension. 


UNDERSTANDING  DATA  STRUCTURES 


19A 


4.4  Data  Bo;.£  Rcslriicturing, 

Restruciurinf;  boromes  necessary  when  errors  or  oversights  are  detected  in  tlie 
current  data  structure  or  when  changes  in  demand  require  a change  in  data 
structure.  Such  a data  structure  change  may  make  certain  programs  obsolete 
and  may  also  require  that  the  data  itself  be  manipulated  and  restructured. 
Restructuring  can  therefore  be  very  costly,  especially  for  large  data  bases, 


The  DMLP  can  have  a beneficial  impact  on  the  costs  of  restructuring  by  reducing 
the  reprogramming  effort,  thereby  reducing  programming  costs  ar-.d  elapsed 
time.  In  addition  to  this  immediate  benefit,  the  technology  and  theory 
developed  in  the  preceding  chapters  suggest  a method  that  can  totally  eliminate 
the  disruption  usually  associated  with  restructuring.  The  disruption  is 
eliminated  by  "dissolving"  the  restructuring  process  in  the  normal  usage  of  the 
data  base. 

An  explanation  of  this  method  requires  the  introduction  of  a new  concept;  data 
structure  generations.  Whenever  a structure  is  modified,  the  old  structure 
description  is  not  discarded;  instead  the  new  structure  is  recogni;;ed  as  a new 
structure  generation  and  is  assigned  an  incremental  generation  number.  The 
current  structure  generation  number  is  used  to  mark  records  when  they  are 
stored  so  that  the  appropriate  structure  description  can  be  used  when  these 
records  are  later  retrieved.  Under  this  scheme  multiple  generations  of 
structure  would  be  allowed  to  exist  simultaneously  in  the  data  base. 


UNDERSTANDING  DATA  STRUCTURES 
4.A  Data  Dasc  Restructuring. 


195 


Programs  are  also  marked  with  the  identifying  number  of  the  generation  in 
effect  when  the  programs  were  written. 

The  system  is  therefore  capable  of  detecting  differences  between  the  assumed 
structure  in  a program  and  the  actual  structure  in  the  data  base  by  comparing 
structure  generation  markers.  As  a result,  whenever  a program  attempts  to 
navigate  through  some  structure  that  belongs  to  a different  generation  than  the 

program,  the  system  provides  a subprogram  that  makes  the  data  structure 

s. 

differences  transparent  to  the  program.  The  previous  chapters  have  shown 
how  these  subprograms  can  be  generated  whenever  the  generation  differences 
are  limited  to  attribute  migration  or  hierarchical  inversion. 

Generation  data  structures  may  also  differ  in  the  paths  that  exist  between 
record  typos.  Again,  this  is  a difference  easily  resolved. 

Differences  that  are  not  so  easily  resolvable  are  structures  that  d.'fjr  in 
possible  information  content.  In  such  cases  the  system  must  ask  for  human 
intervention  or  operate  from  a set  c*  well-known  assumptions.  There  may  be 
other  structural  changes  that  cannot  be  handled  automatically. 

Implementation  of  this  type  of  system  provides  a choice  regarding  the  actual 
reformatting  of  the  data.  In  highly  volatile  data  bases  actual  reformatting  can 
be  indefinitely  postponed  because  of  the  rapid  decay  of  out-of-date  structures. 


UNDERSTANDING  DATA  STRUCTURES 
Data  Base  Restructuring. 


196 


A second  allornative  is  to  cause  automatic  reformatting  of  out-of-date 
structures  as  they  are  encountered  in  the  course  of  normal  processing. 

The  latter  method  assumes  that  "local"  restructuring  is  feasible.  ! would  expect 
situations  to  occur  in  which  this  assumption  is  not  valid.  Since  all  data  is  linked 
together,  “chain  reactions"  might  occur:  a single  change  causes  access  to 
another  record  that  must  also  be  modified,  causing  access  to  another 


out-of-date  structure,  etc. 


UNDERSTANDING  DATA  STRUCTURES 


197 


5 COST  EFFECTIVENESS  OF  AUTOMATIC  PROGRAMMING, 

The  true  cooto  of  this  application  of  Artificial  Intelligence  cannot  be  measured 
until  the  application  process  is  completed,  that  is,  until  the  program  generator  is 
used  with  an  actual  data  base  to  solve  actual  retrieval  problems,  Some 
measurement  of  program  generation  and  data  structure  design  has  been  done 
however.  Considering  the  fact  that  the  implementation  utilizes  interpreted  LISP 
and  Micro7Plannor,  both  of  which  are  notoriously  expensive  [Sussman  and 
McDermott  1972].  these  costs  appear  to  be  quite  promising. 

5. 1 Measures  of.  Automatic  Program  Generation.. 

Table  5-1  gives  values  of  various  measures  of  the  program  generation  process. 
The  queries  .and  their  corresponding  programs  are  named  PI  through  P6.  The 
first  column  of  the  table  contains  a measure  of  the  size  of  the  problem 
statement.  The  values  in  this  column  indicate  the  number  of  input  lines 
containing  entries  other  than  "NIL"  (a  terminator)  entered  for  the  query.  The 
next  two  columns  are  a measure  of  the  complexity  of  the  programming  task; 
indicated  for  each  program  is  the  number  of  rules  the  system  attempted  to  use 
and  the  number  of  such  attempts  ihat  were  successful. 


UNDERSTANDING  DATA  STRUCTURES 
5 Coot  Effrc tivrne^.r.  of  Aufoniatic  Programming. 


198 


f’POO 

RULES 

COBOL 

RUN 

K-CORE 

15 

LINES 

USE 

sue 

LINES 

PARAS 

TIME 

SEC 

PI 

10 

58 

49 

« 

_ 

P2 

13 

8^ 

77 

32 

8 

4:09.1 

5988 

64.70 

P3 

M 

118 

102 

67 

11 

6:22.5 

9229 

97.87 

P4 

16 

i?a 

107 

61 

11 

5:26.2 

7862 

82.56 

P5 

20 

99 

88 

30 

4 

3:14.4 

4681 

49.19 

P6 

1? 

79 

62 

29 

4 

2:55.9 

4230 

44.34 

TOTAL 

566 

485 

219 

38 

2?;08.1 

31990 

338.66 

% 


Table  5-1.  Coofo  of  aLifomatic  program  generalion. 

Columns  five  and  six  measure  Ihe  size  of  fht  resulting  CODOL  programs  in  terms 
of  both  lines  of  code  and  number  of  paragraphs  produced.  The  lines  produced 
count  is  conservative:  lines  which  label  paragraphs  or  sections  are  not  included 
in  the  count.  Eurthermore,  lines  of  excessive  length  which  are  continued  on  a 
subsequent  line  are  only  counted  as  a single  I'oe. 


One  observation  here  is  that  the  size  of  the  problem  statement  is  not  a good 
measure  o(^  the  size  of  the  resulting  program  nor  of  the  generation  cost.  A 
large  proLilem  statement  may  result  in  a relatively  small  program  (as  is  the  case 
with  P5)  if  the  relations  in  the  problem  statement  correlate  highly  with  records 
in  the  data  base. 


L 


The  final  fhrc.'e  columns  of  Table  5-1  indicate  some  direct  measures  of  cost  of 
the  DMLP.  The  first  of  those  indicates  run  time  in  minutes  and  seconds,  the 
second  column  indicati  s the  number  of  kilo-core-seconds  used;  and  the  last 


UNDERSTANDING  DATA  STRUCTURES 
5 Cost  Effectiveness  of  Automatic  Programming, 


199 


column  gives  a total  dollar  charge  for  the  usage  as  calculated  on 
Carnegie-Mellon’s  DEC-10  computer.  The  primary  component  of  the  charge 
incurred  by  I he  program  generator  is  the  kilo-ccre-seconds  used  because  it 
uses  a lot  of  storage.  The  Carnegie-Mellon  charge  is  approximately  8.01  per 
kilo-core-second.  This  compares  to  an  in-house  charge  of  8.0007  per 
kilo-core-sccond  at  First  National  City  Bank  and  an  outside  vendor  charge  of 
8.03  per  kilo-core-second  [Summers  1974],  all  for  DEC-lO’s. 

The  cost  measure  has  only  been  obtained  for  the  generation  of  the  last  five 
programs  in  the  table.  These  five  programs  contain  219  lines  of  COBOL  source 
code  (including  DML  statements)  and  cost  a total  of  8338.66  in  c.  , ter  time  to 
generate.  This  gives  a mean  cost  of  81.55  per  line  of  code  with  a range  of 
81.35  to  82.02  per  line  generated. 

It  is  tempting  to  compare  the  cost  of  81.55  per  line  of  code  with  equivalent 
values  measured  for  human  programmers.  Before  I do  so,  some  cautionary 
statements  are  in  order.  The  figure  of  81.55  per  line  is  a variable  cost  and 
does  not  take  into  consideration  the  costs  of  development  of  fhe  DMLP. 
Secondly,  the  cost  to  derive  the  problem  statement,  a human  cost,  is  not 
included.  Thirdly,  it  is  certainly  possible  that  the  program  obtained  may  turn 
out  to  be  unsatisfactory  because  of  errors  or  omissions  in  the  problem 
statement.  Such  a condition  will  require  a further  Iteration  of  program 
generation,  and  ea:h  such  iteration  will  add  an  additional  81.55  per  line  to  the 


r 

I 

I 

! UNDERSTANDING  DATA  STRUCTURES  200 

5 Cost  Effectiveness  of  Automatic  Programming. 

cost  of  the  final  program.  Obviously,  an  eytension  to  the  system  which  would 
allow  it  to  modify  previously  generated  programs  without  a complete 
regeneration  would  he  of  great  value. 

5.2  Cost  F actors  m Programming. 

I The  cost  of  the  DMLP  will  be  compared  with  costs  calculated  from  four  different 

sources.  The  predicted  number  of  man  months  and  computer  usage  for  1000 
I lines  of  COUOL  code  are  given  for  each  source  in  Table  5-2.  A v^ry  detailed 

stucly  of  cost  factors  was  carried  out  at  the  System  Development  Corporation 
and  has  boon  reported  in  [Nelson  1966;  LaOolle  1966].  This  study,  also 
discussed  in  [Sharpe  1969],  is  based  on  observations  of  12  COBOL  programming 
projects.  ’ have  included  both  the  minimum  and  the  mean  cost  factors  from  the 
SDC  study. 

I’ 


NELSON 

(mean) 

BRANDON 

DELANEY 

BAKER 

MILLS 

NELSON 

(min) 

MAN  MO. 

^.83 

7.0 

6 

1.72 

.27 

COMP  HRS. 

18.08 

20.5 

25 

7.22 

1.18 

Table  5-2.  COBOL  cost  factors  for  1000  source  lines. 


The  Brandon  study  [Brandon  1963]  gives  normative  (standard)  values  for 


1 


t 


resource  expenditures  for  programming  in  COBOL  on  the  IBM  7080.  Brandon 


UNDERSTANDING  DATA  STRUCTURES 
5 Cost  Ef fecf ivencGS  of  Aiitomafic  Prograrv'ming. 


20  J 

provides  an  algorithm  for  calculating  these  values  which  is  dependent  on  the 
complexity  as  well  as  the  size  of  the  program.  Brandon’s  algorithm  has  a fixed 
cost  component.  As  a result,  the  cost  per  line  will  be  somewhat  lower  for 
lengthier  programs,  all  other  things  being  equal.  The  figures  given  in  Table  5-2 
were  linearly  extrapolated  from  values  obtained  frora  the  algorithm  for  a 280 
line  COBOL  program  of  average  complexity. 

Delaney  s algorithm  [Delaney  J966]  is  simply  based  on  a productivity  rate  of  10 
tines  of  code  per  man  day,  plus  a 257-  additional  cost  for  program  checkout. 
Computer  time  is  caculated  at  the  rate  of  1 hour  per  man  week.  Delaney  does 
not  indicate  how  his  figures  were  obtained.  The  productivity  rate  quoted  by 
Delaney  is  applicable  to  programming  in  machine  code.  The  SDC  study  found 
that  programmers  generate  COBOL  code  at  a slower  rate  than  machine  code,  so 
Delaney’s  figures  may  be  somewhat  optimistic  for  COBOL  programming. 

The  fourth  stLicly  [Baker  and  Mills  1973]  reports  the  results  obtained  from  a 
single  programming  project  at  IBM.  The  computer  hours  used  In  this  project 
were  not  indicated,  so  the  figure  given  in  the  table  has  been  estimated  using  the 
one-hour-per-man-week  rule  of  thumb.  Baker  and  Mills  claim  that  the 
improvement  in  productivity  is  due  to  special  management  (Chief  Programmer 
Teams)  and  programming  (structured)  techniques.  Since  the  earlier  SDC  study 
showed  a very  large  performance  variance,  we  cannot  attach  much  significance 
to  the  single  observation  provided  by  Baker  and  Mills. 


UNDERSTANDING  DATA  STRUCTURES 
5 Cost  Effectiveness  of  Automatic  Prop/amming. 


202 


All  four  of  these  sources  considered  only  the  programmer’s  task  in  calculating 
the  cost  factors.  Other  costs,  such  as  problem  definitio-i  cost  and  design 
iteration  costs,  were  not  included.  Although  the  Brandon  and  Delaney 
algorithms  have  a documentation  cost  component,  the  values  in  Table  5-2  do  not 
include  this  cost  since  the  DMLP  does  not  generate  doc - mentation. 

Table  5-2  is  fairly  consistent  with  the  DMLP  as  regards  the  basis  for  the 
calculations.  There  are,  however,  some  costs  that  will  be  incurred  with  the 
DMLP  that  are  not  accounted  for  in  Table  5-2.  These  costs  consist  of  an 
investment  cost  associated  with  the  DMLP,  and  the  cost  to  phrase  the 
information  retrieval  problem  in  terms  of  Hl-IQ  These  costs,  especially  Ihe 
tatter,  are  not  negligible,  but  empirical  study  is  needed  to  determine  actual 
values. 

5.3  Comnanson  of  Dollar  Costs. 

\ 

By  assigning  some  dollar  costs  to  the  factors  in  Table  5-2  it  becomes  possible  to 
directly  compare  the  various  cost  predictions  fexperiences)  with  the  costs 
observed  for  the  DMLP.  Such  a comparison  is  given  in  Table  5-3.  The  minimum 
cost  line  is  based  on  a man-month  cost  of  $1000.  The  reasonable  cost  line  is 
based  on  13000  per  man-month  (derived  from  an  annual  salary  of  118,000  plus 
100^  for  burden  and  overhead). 


UNDERSTANDING  DATA  STRUCTURES 
5 Cost  Effectiveness  of  Automatic  Programming, 


203 


DMLP 

NELSON 

(mean) 

BRANDON  DELANEY 

BAKER 

MILLS 

NELSON 

(min) 

MIN  COSTS 
REAS  COSTS 

lOS 

1,550 

6,638 

23,530 

9,050 

31,250 

8,500 

30,500 

2,422 

8,770 

1,450 

6,710 

Table  5-3.  8 Cost  for  1000  lines  of  COBOL  in  1974, 

Both  lines  in  Table  5-3  also  take  into  consideration  the  machine  cost  factor.  It 
is  more  difficull  to  determine  appropriate  charges  for  computer  usage,  because 
the  computer  hour  usage  figures  in  Table  5-2  were  obtained  several  years  ago. 
On  the  one  hand,  computer  hours  to  implement  1000  lines  of  COBOL  mighi  be 
less  today  because  of  the  improved  efficiency  of  computer  hardware. 
Conversely,  debugging  time  might  have  increased  because  of  the  increased 
complexity  of  the  programming  environment.  Furthermore,  because  of 
multi-programming,  computer  costs  are  no  longer  a simple  function  of  CPU  time. 

As  a minimum  value,  I will  use  a charge  of  8100  per  1963-1966  equivalent 
computer  hour.  This  assumes  an  efficiency  improvement  factor  of  1/5.  As  a 
reasonable  value,  I will  use  a charge  of  8500  per  computer  hour.  This  assumes 
that  the  factors  discussed  above  in  combination  with  inflation  have  cancelled 
each  other  out.  Table  5-3  compares  the  minimum  and  reasonable  dollar  costs 
based  on  the  cost  factors  of  Table  5-2  with  the  costs  experienced  with  the 


DMLP. 


1 


UNDERSTANDING  DATA  STRUCTURES  204 

5 Cost  Effectiveness  of  Automatic  Programming. 

The  reasonable  cost  for  the  DMLP  is  based  on  the  charge  of  1.01  per  k-core 
second  actually  experienced  (the  cost  also  includes  minor  charges  for  connect 
time,  etc.).  The  minimum  cost  figure  is  based  on  the  charge  of  1.0007  per 
k-core  second  used  for  the  in-house  DEC- 10  at  First  National  City  Bank. 

The  DMLP  has  a dollar  cost  savitig  of  927  to  987  in  the  minimum  cost  frame.  In 
the  reasonable  cost  frame,  the  DMLP  reduces  cost  by  777  to  9b/.. 

This  reduction  in  cost  leaves  a lot  of  room  for  absorption  of  the  costs  associated 
with  the  DMt  P.  Note  that  the  costs  for  the  DMLP  as  actually  experienced  are 
equivalent  to  the  minimum  charges  applied  to  the  most  efficient  project 
observed  by  Nelson;  i.c.  a "super"  programmer  working  for  112,000  a year, 
incurring  no  burden  or  overhead. 

5.4  Execution  Costs. 

\ 

The  costs  discussed  so  far  are  only  program  generation  costs  and  ignore 
execution  costs.  The  execution  of  computer  generated  code  may  be  more  or 
less  efficient  than  the  execution  of  programs  written  by  people.  We  would  like 
to  think  that  there  is  some  person  who  can  write  a better  program  than  a 
machine  for  any  problem.  But,  the  DMLP  need  only  be  better  than  the  average 
programmer  for  a net  advantage  to  result  from  using  it. 


UNDERSTANDING  DATA  STRUCTURES 
5 Cost  Effectiveness  of  Automatic  Programming, 


205 


5.5  Cost  g(  Data  ^ met  lire  PosiRn, 

It  IS  difficult  to  judge  the  cost  savings  from  automatic  structure  design  since 
first,  the  automatic  designer  only  partially  replaces  the  human  designer. 
Second,  this  task  is  quite  now  and  cost  experience  is  not  yet  available.  The 
cost  of  automatic  design  seems  quite  low,  however.  The  costs  given  in  table 
5-4  were  obtained  from  the  DEC-10  computer  at  Carnegie-Mellon  University  and 
are  based  on  the  same  charging  algorithm  as  described  above. 

The  design  process  occurs  in  two  steps.  Simple  fact  accumulation  from  the  set 
of  queries  is  clone  first.  Then  the  actual  design  is  done  from  the  accumulated 
facts.  The  fact  extraction  process  consists  of  reading  the  query  and  extracting 
from  it  the  implicit  relationships  it  contains.  Table  5-4  shows  that  41  facts 
were  extracted  from  a sot  of  six  queries  containing  55  statements  at  a cost  of 

S15.93.  The  costs  given  here  were  obtained  from  the  example  given  In  Chapter 
Three. 

The  second  stage  of  the  design  process  determines  a network  structure  that  will 
accomodate  all  of  the  necessary  relations  as  captured  in  the  facts  base.  This 
process  is  more  complex  than  fact  accumulation  and  requires  almost  twice  as 
much  time  and  a cost  of  828.71. 


i 


UNDERSTANDING  DATA  STRUCTURES 
5 Cost  Effcclivencos  of  Automatic  Programming. 


206 


INPUT 

QTY 

RUN  TIME 

8 

STMNTS 

55 

1:15.65 

15.93 

FACTS 

41 

2:15.27 

28.71 

TOT 

3:30.92 

44.64 

Table  5-4.  Cost  of  Generating  a Design. 


The  savings  po'cntial  of  these  systems  is  high  enough  to  justify  this  and  further 
research. 

A revision  of  the  rules  used  by  the  DMLP  so  that  they  can  be  used  by  people  is 
suggested.  This  would  provide  a method  of  structured  prog'  amming  for  data 
base  programmers.  Baker  and  Milts  attributed  much  of  their  success  to 


structured  programming. 


UNDERSTANDING  DATA  STRUCTURES 


207 


6 ON  THE  APPLICATION  OF  ARTIFICIAL  INTELLIGENCE. 

K/iOst  successful  Arlificial  Intelligence  (AI)  programs  exce'"cise  their  problem 
solving  ability  in  unrealistic  environments  or  apply  it  to  a game  playing  task. 
Many  of  these  applications  are  also  not  objective  since  the  problem  environment 
has  been  selected  and  constrained  by  the  AI  researcher.  Actual  replacement  of 
human  problem  solvers  with  the  machine  i«  usually  of  no  value  to  anyone  except 
the  ,AI  community,  Critics  of  AI  can  frequently  stand  unrefuted  because  much  of 
what  they  claim  is  true:  practical  and  objective  applications  of  AI  are  fe'v  and 
far  between. 


I 


I 


To  find  in  this  criticism  an  indictment  of  AI  is  unrealistic  and  unfair.  As  many  of 
the  proponents  of  AI  have  claimed  in  defense,  it  Is  necessary  for  AI  to  cut  its 
teeth  on  toy  problems,  games,  and  in  otherwise  limited  environments  before  it 
can  proceed  to  fill  its  highly  touted  promises.  Perhaps  the  only  fair  criticism  is 
that  this  teeth  cutting  is  taking  longer  than  envisioned  by  early  workers  in  the 
field. 


!t  is  my  feeling  that  AI  is  entering  an  age  where  application  becomes  feasible. 
By  my  statements  here  1 do  not  mean  to  imply  that  there  have  not  been 
valuable  spin-offs  from  AI  that  have  contributed  to  advances  in  fields  such  as 
computer  science,  linguistics,  and  psychology.  However,  there  have  not  been 
many  direct  applications  of  complete  AI  technologies  (with  perhaps  the 
exception  of  the  pattern  recognition  field)  to  problem  areas  existing  outside  the 


UNDERSTAMOING  data  STRUCIURES 
6 On  the  Application  of  Artificial  Intelligence. 


208 


domain  of  AI,  and  especially  outside  the  academic  domain. 

My  own  eypcrience  is  perhaps  a good  demonstration  of  the  roming  of  age  of  AI. 
The  background  I bring  to  this  research  is  not  primarily  m AI,  but  lies  more  in 
the  area  of  commercial  data  processing,  data  base  management,  and  information 
retrieval.  My  eyperience  in  these  areas  predates  h / several  years  my 

introductory  eyposure  to  AI.  I think  this  distinction  is  important  because  the 
problems  which  will  be  described  shortly,  eyisted  for  me  long  before  I 
discovered  the  technology  to  solve  them.  As  such,  my  circumstances  provided  a 
degree  of  objectivity  not  available  jo  the  AI  researcher  who  is  looking  for 
problems  to  adapt  to  his  technology. 

6.1  Data  Management  Application. 

This  thesis  dc'seribes  a system  which  generates  information  retrieval  programs 
and  data  siruclure  declarations.  These  two  problem  solving  tasks  have  been 
around  for  a long  time,  but  growing  commercial  usage  of  network  data  bases  has 
suddenly  increased  their  compleyity.  (When  I refer  to  a network  data  base  I do 
not  necessarily  mean  a data  base  on  a computer  network,  but  one  in  which  a 
data  item  ntay  be  linked  to  many  other  data  items  in  rings,  hierarchies  or  other 
structures.)  There  has  been  an  attempt  within  the  United  States  and  Canada  to 
develop  a standard  for  network  data  bases  through  CODASYL  which  spawned 
the  inter-academia-industry  Data  Base  Task  Group  (DBTG).  This  group  has 
produced  a specification  for  a general  purpose  network  data  base  system 


1 


UNDERSTANDING  DATA  STRUCTURES 
6 On  the  Application  of  Artificial  Intelligence. 


209 


[CODASYL  1971a].  Although  not  yet  accepted  as  standard,  it  is  believed  by 
many  that  the  DBTG  specification  will  be  so  adopted,  perhaps  by  default  since 
many  computer  manufacturers  and  software  firms  are  now  selling 
implementations  of  the  DBTG  specification. 

Both  applications  described  herein  function  in  the  data  base  environment  as 
specified  by  the  DBTG.  Data  structures  designed  by  the  system  conform  to 
DdTG  specifications,  and  the  generated  retrieval  programs  are  COBOL  Procedure 
Divisions  containing  Data  Manipulation  Language  (DML)  as  defined  by  the  DBTG. 

This  adherence  to  externat  specifications  is  another  important  characteristic  that 
distinguishes  this  appiication  from  other  less  practical  and  less  objective 
applications. 

6.1.1  Prop.ramminR  in  network  data  bases. 

Introduction  of  the  network  type  of  data  base  brings  with  it  a new  level  of 
complexity  for  the  business  programmer.  As  Early  [Early  1972]  points  out, 
instead  of  the  usual  logical-physical  dichotomy  which  a programmer  traditionally 
deals  with,  there  is  a logical(a)-logical(b)-physical  trichotomy  facing  the  network 
data  base  programmer.  The  logical(a)  level  corresponds  to  data  relationships  as 
perceived  by  the  data  user.  The  logical(b)  level  concerns  the  actual  links 
between  the  data  in  the  data  base. 


UNDERSTANDING  DATA  STRUCTURES 
6 On  the  Application  of  Artificial  Intelligence. 


210 


> 


Bachman  recognizes  this  problem  in  the  1973  ACM  Turing  Av.'ard  Lecture 
[Bachman  1973J  where  he  likens  the  programmer’s  problem  (in  the  netwo  k data 
base  environment)  to  that  of  a navigator.  The  programmer  is  given  the  user's 
desired  data  and  relations;  his  task  is  to  find  the  proper  access  path  through  the 
myriad  of  connective  links  in  the  data  base. 

This  problem,  the  programmer’s  problem,  should  he  reduced.  After  all,  much  of 
the  problem  is  caused  by  an  artifact  introduced  by  the  data  network,  the  data 
links.  It  should  be  noled  here  that  others  in  the  field,  primarily  Codd  [Cocid 
1970,  !9/2a,  1972b],  have  propocod  relational  models  of  data  bases  whereby 
they  hope  to  avoid  the  emergence  of  this  artifact.  It  is  not  clear  to  me 
however,  how  Ihis  can  be  clone  without  exactly  the  kind  of  problem  solving 
program  which  is  the  subject  of  this  paper. 

Finding  a path  to  certain  pieces  of  data  while  satisfying  certain  constraints  is 
\ 

not  at  all  dissimilar  to  a complicated  robot/maze  task.  This  similarity  may  be  a 
partial  explanation  for  the  direct  applicability  of  AI  technology  to  the  problem. 

The  technology  used  is  not  robot  technology  however.  Instead,  the  solution 
was  found  in  the  automatic  programming  area.  Specifically,  I used  a compiler 
developed  by  Buchanan  and  Luckham  [Buchanan  and  Luckham  1974;  Buchanan 
1974]  to  create  the  data  base  retrieval  program  generator.  The  retrieval 
program  generator,  consisting  primarily  of  a set  of  Micro-Planner  theorems' 


UNDERSTANDING  DATA  STRUCTURES 
6 Oft  the  Application  of  Aitificial  intelligence. 


211 


[Hewitt  1971;  Sucsman  and  Winograd  1972],  compiled  by  the  APG  oystem  from 
a set  of  rules.  Tlicse  rules  are  stated  in  a formahsm  corresponding  to  Hoare’s 
logic  of  programs  [Hoaro  1969].  Each  rule  describes  a loop,  an  assignment 
statement,  a Data  Manipulation  Language  Statement,  or  other  program  construct. 

A rule  describes  the  effect  of  a program  construct  and  the  conditions  under 
which  it  may  bo  used.  For  example,  the  '■ule  for  tlie  GET  statement  states  tliat 
(a),  it  is  fiot  possilde  to  GET  a record  unless  its  location  in  the  data  base  is 
known  to  ttie  data  management  system  and  (b),  following  the  GET,  the  record  is 
located  in  working  storage  Other  rules,  especially  ones  dealing  with  the 
assemblage  of  larger  program  blocks,  do  not  bear  su'“h  close  resemblance  to 
sections  of  a programming  manual  as  does  the  rule  for  GET.  These  other  rules 
capture  my  own  programming  knowledge  which  I had  previously  acquired  and 
developed  as  a professional  programmer.  II  was  a difficult  exercise  to  express 
my  knowledge  in  the  APG  formalism. 

The  rules  are  translated  to  Micro-Planner  theorems.  The  resulting  program 
(consisting  of  Micro-Planner  theorems)  is  an  information  retrieval  program 
generator.  One  of  its  inputs  is  the  above  mentioned  data  base  structure 
definition,  the  other  input  is  a retrieval  query.  The  output  is  a COBOL 
Procedure  Division. 

The  program  generator  is  data  base  independent,  an-  m^y  generate  different 
programs  for  identical  queries  if  the  structure  is  different.  This  Is  why  the  data 


UNDERSTANDING  DATA  Ru'CTURES 
6 On  the  Application  or  Artificial  Intelligence. 


212 


base  striictiiro  is  one  of  the  inputs  to  the  program  generation  process.  The 
design  of  this  structure  is  the  s’jb)ecl  of  the  next  section. 

64.2  PesiiininR  network  ^Ij  struciyres 

Another  problem  attendant  to  network  data  base  management  is  data  base 
structure  design.  That  is,  to  design  approoriale  links  to  exist  between  data 
elements  so  that  data  relationships  can  be  properly  captured  and  reconstructed. 

Since  all  such  relationships  must  be  implicitly  contained  in  the  set  of  all  queries, 

« 

it  shouid  be  possible  to  derive  a data  base  structure  from  such  a set. 

The  automatic  data  structure  designer  operates  in  exactly  this  fashion.  It 
generates  a data  structure  design  that  ir  satisfactory  for  a set  of  queries. 

The  power  of  the  design  program  lies  in  two  areas.  First,  it  has  the  ability  to 
recognize  hierarchies  and  inversions  thereof.  Detection  of  a hierarchy  that  is 
processed  "botii  ways"  leads  to  lha  inclusion  of  a confluent  hierarchy  in  the 
designed  structure.  For  example,  a confluent  hierarctiy  would  be  used  to  store 
information  regarding  doctors  and  patients  since  doctors  can  have  several 
patients,  and  conversely,  patients  can  have  several  doctors. 

The  designer  also  determines  the  location  ol  data  items  (attributes)  in  the 
hierarchical  structure.  For  exampl*',  hospital  name  can  be  a patient  attribute 
(where  he  is  being  treated),  a doctor  attribute  (where  he  works),  and  of  course 


UNDCRSTANDING  DATA  STRUCTURES 
6 On  the  Af)plication  of  Artificial  InlelliRence 


213 


a hospital  allributo.  If  hospital-patient  and  tiospital-cfoctor  hierarchies  (several 
patients  and  several  cloclors  per  hospital)  lia»'e  already  been  constructed  by  the 
system,  then  it  will  assign  the  attribute  in  question  to  the  hospital  record.  This 
permits  usap.e  of  hospital  name  as  a unique  altribute  of  hospital,  doctor,  and 
patient. 

In  Early’s  words,  the  sysl'^m  Itas  found  a logical(b)  arrt.ngemenf  which 
accomodates  all  logical(a)  relationships.  As  a mailer  of  fact,  translation  from 
logical(a)  to  logic al(b)  also  characleri7es  (he  system’s  activities  as  a programmer. 
Further  translation  to  the  ptiysical  level  is  accomplished  by  the  data  management 
system,  e.g.  an  implerrienlalion  of  the  [TDTG  specilication.  As  was  the  case 
with  the  program  generator,  tlie  system  is  defined  with  a set  of  rules  that  are 
translated  by  a compiler  lo  a program  of  Micro-Planner  theorems.  However,  in 
this  case  the  resulling  system  is  not  a code  generator,  it  is  a declarative 
generator. 

6.2  Accjuisilion  and  Reni'csenlation  o£  Knowledge. 

K 

If  a system  is  to  apply  Knowledge,  it  must  first  obtain  such  Knowledge.  The 
ongoing  work  in  "learning"  systems  is  partially  motivated  by  the  desire  to 
minimize  the  effort  of  building  the  Knowledge  base  for  a system  which  must 
understand  a particular  domain.  Until  generalized  learning  systems  become 
available  which  can  be  adapted  in  a practical  way  for  use  with  a particular 
application  domain,  it  is  necessary  to  imbed  Knowledge  of  the  application  either 


UNDERSTANDING  DATA  STRUCTURES 
6 On  llie  Applicalion  of  Artificial  Intelligence. 


21A 


by  building  a specific  learner  lor  lhal  system  or  by  providing  the  system 
directly  will,  tbe  knowledge  il  needs.  The  trouble  with  the  former  approach  is 


that  building  the  learner  may 
the  rest  of  the  system. 


turn  out  to  be  a more  complex  task  than  building 


The  tatter  approach  is  the  one  that  was  taken  lor  the  applications  discussed 
here.  The  rule  formalism  provides  a very  good  framework  lor  capturing 
network  riala  base  programming  knowledge.  Even  so,  capluring  Ihe  knowledge 
was  not  a slraightlorward  process  and  invotved  many  trial-and-error  iteralions. 


The  occurrenre  ol  so  much  triat-and-error  even  when  using  this  formalism  is 
perhaps  surprising.  The  problem  arises  because  much  ol  the  oxisling 
knowledge  (residing  in  Ihe  application  specialist's  head)  does  not  have  a well 
defined  sirudure.  It  is  possible  to  map  the  specialist's  knowledge  to  a 
structured  set  ol  formal  rules  in  several  ways,  all  ol  which  arc  valid,  but  some 
of  which  load  to  an  extremely  inellicieni  program  generation  process. 

However,  some  ot  the  knowledge  does  have  a well  delined  structure,  lor 
example  the  knowledge  regarding  GET  as  discussed  above.  In  general,  Ihe  rules 
for  single  program  statements  are  very  similar  to  usage  rules  that  might  bo 
found  in  a relerence  manual.  It  is  the  rules  lhal  control  the  next  higher  level  ol 
programmi.,,.,  the  consiruction  ol  program  blocks,  lhal  are  complex  and  not 

easily  derived. 


UNUERSTANtDIMG  DATA  STRUCTUFJES 
6 On  the  Application  of  Artificial  Intelligence. 


215 


Since  the  content  of  the  rules  clelerrtrines  the  decision  tree  that  controls  the 
search  space,  the  choice  of  mappings  menlionccl  above  can  have  a dramatic 
impact  on  the  efficiency  of  the  search  for  a program'.  If  the  use  of  a particular 
rule  occurs  far  out  in  Ihe  branches  of  the  tree,  and  if  the  rule  is  a powerful 
discriminator,  then  it  would  he  appropriate  to  consider  an  allerralive  rule 
structure  that  would  permit  eailier  evaluation  of  the  rule  in  question. 


Notice  that  this  makes  it  very  important  that  Ihe  applications  specialist  be  very 
cosely  integrated  with  Ihe  At  technologist,  Close  integration  will  allow  the 
specialist’s  knowledge  to  he  captured  in  a way  that  is  most  effective  for  the  A1 
methodology  being  used.  This  task  is  ccmplica'ed  if  the  A1  representation 
shows  little  corresponderree  to  the  "natural"  representation. 


In  the  particular  application  discussed  in  Inis  paper,  the  integration  was 
accomplished  by  training  Ihe  application  srecialist  (myself)  to  become  an  A1 
technologist.  I can  only  imagine  Ihe  problems  that  would  result,  especially 
regarding  efficiency,  if  there  is  less  integration  and  the  two  functions  are 
embodied  in  difletenl  individuals. 

\ 

Table  5-1  illustrates  the  impact  of  close  integration  on  the  efficiency  of  the 
resulting  application.  For  example,  of  the  8^1  rules  that  were  tVied  for  the 
generation  of  program  P2,  only  7 were  unnecessary.  For  all  six  programs,  out 
of  566  rules  tried,  A85  were  successful.  In  other  words,  on  average  only  147 


216 


UNDERSTANf)lNG  DATA  STRUCTURES 
6 On  the  Application  of  Artificial  Intelligence. 

Of  the  search  takes  place  in  fruitless  branches  of  the  tree. 


A,  the  -I..50  o'  »' 

01  .n'cA.alion  o'  'he  A1  lecOnoloji.,  ,nd  ,ho  ,pp'lo."on.  -.poci,...-.- 

Up.»  p..,<ho,oKi<..,  people  e.oee  Koo.lodee,  " O"  "O— ^ 

for  the  applications  specialist  to  "come  to  the  machine  and 

p„cesoes  .n  A,  ,ee-oolosy.  .:oove-«'y.  ■- 
acquire  a great  deal  of  knowledge  about  the  application. 


UNDERSTANDING  DATA  STRUCTURES 


217 


7 THE  REI.ATIONAI.  AND  NETWORK  MODE!  S OF  DMA  BASES, 

7.1  I itT.o  sly.- 'J  Oil- 

Several  recent  pafters  on  reialionat  and  network  d ta  bace  models  discuss  these 
models  in  a choice  framework.  That  is,  Ihe^e  papers  make  the  assumption  that 
a choice  of  one  model  is  to  be  made,  and  then  argue  the  supremacy  of  one 
mode!  over  the  olher.  Codd  and  Dale  [Codd  and  Dale  107^1  Dale  and  Codd 
197A]  recenlly  argued  for  the  relational  inodei,  others  [.Jardine  197A]  have 
argued  the  olher  way. 

I find  it  unnecessary  lo  argue  this  point.  Obviously,  the  relational  model 
provides  a much  belter  applications  interface  and  a belter  framework  for  the 
discussion  of  data  bases.  But  1 ask  the  rhetorical  question:  How  do  proponents 
of  the  relational  model  propose  to  implement  gerreral  relational  data  bases 
without  utilizing  some  type  of  network  structure?  Contemplation  of  this  question 
reveals  my  central  premise  which  is  besfillustraled  by  analogy  - the  relational 
model  Is  to  I he  network  model  as  COBOL  is  to  assembly  language. 

Viewing  the  two  models  in  this  fashion  lead*  lo  the  realisation  that  there  is  a 
need  for  both  models  and  suggests  an  approach  whereby  the  relational  model 
can  be  developed  and  incorporated  by  users  in  an  orderly  fashion. 


7.2  Levels  cd  Data  Structure  Descriptim 

Codd  [Code!  1970]  originally  intTjduced  the  relational  model  as  a means  for 


UNDERSTANI)IN(5  DATA  STRUCTURES 
7 The  RoUilionfll  and  Network  Models  of  Data  Bases. 


218 


higher  level  dal  a strLicluro  description.  However,  just  as  COBOL  and  other  high 
level  lanpiiages  must  be  translated  to  a machine  level  sequence  of  instructions, 
so  must  relational  representations  be  translated  to  access  path  descriptions. 
Early  [Early  1972]  was  the  first  to  formalize  the  levels  of  description  attendant 
to  data  bases.  Early  defined  three  levels  of  description: 

(a)  relational 

(b)  access  path 

(c)  implementation 

The  two  highest  levels  comprise,  in  Early’s  terms,  the  semantic  level  of 
description. 

Although  Ihe  access  path  and  implementation  levels  are  usually  Invisible  to  the 
user  of  'a  relational  level  model,  the  user  should  demand  standardi;.ation  at  all 
three  levels,  and  especially  at  the  access  path  level.  If  he  doer,  not  get  such 
standarclization,  he  will  find  that  his  data  base  is  not  transportable  across 
machines  ni  across  various  implementations  of  the  relational  model. 

There  are  many  possible  choices  of  models  at  the  access  path  level,  including 
sequential  file,  indexed  file  and  network  data  bases.  The  most  general  of  these 
is  the  network  model,  and  the  DBTG  report  represents  a standardization  of  this 
model. 

The  network  model,  primarily  as  embodied  in  the  DBTG  report  and  its 
predecessor,  IDS  [Bachman  and  Williams  1964],  has  enjoyed  far  greater 


UNDERSTANDING  DATA  STRUCTURES 
7 The  Relational  and  Nelwork  Models  of  Data  Bases.  ’ 


219 


acceptance  in  impicirientation  than  tias  ttie  rclalmna'  model.  This  success  Is  not 
due  to  a flaw  in  the  relational  rriodel,  but  is  simply  caused  by  the  limitations  of 
technolop,y  Since  the  relational  model  captures  a higher  level  of  generality,  it 
requires  further  advancements  in  tectinology  for  implementation.  In  terms  of 
the  preceding  analogy,  assernblers  preceded  compilers  for  essentially  simiMr 
reasons. 

7.3  Translation  Approach. 

Because  of  Ihe  increased  acceptance  and  use  of  DBTG  systems,  and  the  need  fo"' 
using  a standard  access  path  model,  tfte  nevt  logical  step  in  the  development  of 
the  relational  model  is  to  devise  translators  for  the  translation  of  relational  level 
descriptions  lo  DE3TG  access  path  descriptions.  This  allows  development  of  the 
relational  model  on  Ihe  base  of  existing  DBTG  implementations. 

I call  this  Ihe  translation  approach  and  differentiate  it  from  the  total  approach. 
The  total  approach  is  the  approach  taken  wilh  implementations  of  the  relational 
model  to  date.  That  is,  each  such  implementation  is  self-contained,  and  access 
path  level  and  implementation  level  descriptions  are  hidden  from  the  user.  Such 
implemenlations  of  the  relational  model  therefore  usually  differ  not  only  at  the 
user  (relational)  interface,  but  also  in  the  storage  structure  (access  paths)  of  the 


i 


data. 


UNDERSTAMDINt;  DATA  STRUCTURES 
7 The  Relational  and  NelworK  Models  of  Data  Bases. 


220 


7.4  Advantav.r;.  oj[  translation. 

The  translation  approach  offers  several  advantages  over  the  total  approach; 

(a)  protects  current  users  of  DBTG 

(b)  is  extensible  with  minimized  disruption 

(c)  permits  user  flexibility 

(d)  permits  orderly  development  of  the  relational  model 

Standardization  at  the  access  path  level,  as  was  urged  above,  is  facilitated  with 
the  translation  approach.  These  advantages  all  acrue  because  the  translation 
approach  is  well  defined  (or  locked-in)  af  the  access  path  level. 

Once  a usoi's  lelational  description  has  been  translated  to  a DBTG  access  path 
description  (e.g.  COfTOI.  - DML  program),  this  description  then  exists  within  a 
fixed  framework  Subsequent  changes  in  the  relational  technology  or  standards 
will  not  affect  the  user's  operations.  If  ho  chooses  to,  the  user  can  re-translate 
selected  relational  descriptions  to  take  advantage  of  such  modifications. 

If  the  necessary  relational  technology  has  not  yet  been  developed  for  a certain 
application,  Ific  user  can  always  (all  back  on  the  access  path  model  and  manually 
generate  the  necessary  access  path  descripifons,  or  he  can  modify  previously 
translated  access  path  descriptions  so  that  his  application  is  properly  processed. 
The  same  options  are  available  for  the  improvement  of  inefficiencies  caused  by 
the  generality  of  t.  relational  model. 


UNDERSTANDING  DATA  STRUCTURES 
7 The  Relational  and  Network  Models  of  Dal  a Rases.  ' 


221 


7.5  Two  Imnlemonted  Translators. 

The  DMLP  and  the  automatic  designer  arc  translators.  The  Hl-IQ  language 
(which  both  accept  as  input)  allows  users  to  specify  queries  without  worrying 
about  the  structure  of  the  data  base.  UMQ  is  a relational  language  because  it 
lets  the  user  specify  queries  in  terms  of  the  relationships  he  perceives. 

Codd  might  not  agree  lo  this  classifical  on  of  HI-IQ  because  it  is  not  based  on 
his  relational  calculus  or  algebra  and  because  it  does  not  permit  the  naming  of 
relations.  This  means  that  multiple  direct  relationships  between  two  entities 
cannot  be  specified  because  a name  is  needed  lo  distinguish  between  them. 

Such  multiple  direct  relationships  are  infrequent  in  commercial  processing.  In 
any  case,  extension  of  HI-IQ  and  the  two  systems  to  permit  the  naming  of 
relations  would  not  be  difficult. 


i 


( 


V 


UNDERSTANDING  DATA  STRUCTURES 
BIBLIOGRAPHY. 


222 


Bachm^tn,  C.  W.  (1965),  "Software  for  Random  Access  Processing", 
Dataiiiation,  April  1965,  pp36-41. 

Bachman,  C.  W.  (1969),  "Data  Structure  Diagrams  ',  Da!a  Base  i,  2, 
Quarterly  newsletter  of  ACM-SIGBDP. 

Bachman,  C.  W.  (1973),  "The  Programmer  as  Navigator",  CACM  16.  1 1, 
November  1 9/3,  pp653-658. 

Bachm.in,  C.  W.  and  S.  B.  Williams  (1964),  "A  General  Purpose 
Programming  System  for  Random  Access  Memories",  Proceed! ngs.  Fall 
Joint  Computer  Conference  1 964.  pp41 1-422. 

Baker,  Terry  F.  and  Harlan  D.  Iv^ills  (1973),  "Chief  Programmer 
Teams",  Datamation.  December  1973,  pp58-61. 

Boyce,  R.  F.,  D.  D.  Chamberlin,  W.  F.  King  111  , and  M.  M. 
Hammer  (1973),  "Specifying  Queries  as  Relational  Expressions:  Square", 
ILVt  Research  RJ  1291,  October  1973. 

3ranc,in,  Dick  H,  (1963),  Manap.ement  Standards  for  Data  Processing. 
Van  NcrstrancI  Co.,  Princeton,  N.J.,  1963. 

Buchanin,  J.  R.  (1974),  "A  Study  in  Automatic  Programming",  PhD 
• Thesis,  Stanford  University. 

Buchanan,  J.  R.  and  D.  C.  Luckham  (1974),  "On  Automating  the 
Construction  of  Programs",  Stanford  Al  Project  Memo,  Stanford 
University. 

Childs,  (I  L.  (1968),  "Feasibility  of  a set-theoretical  data  structure  - 
a general  structure  based  on  a reconstituted  definition  of  a relation", 
Proc.  IFJP  congres,^  1968,  North  Holland  Publishing  Co.,  Amsterdam, 
ppl62-i'72.  " 

COOASYL  (1970),  CODASYL  COBOL  Journal  of  Development.  1969, 
avail  this  from  ACM,  New  York  City;  IFIP  Administration  Data  Processing 
Groi',>,  Amsterdam;  and  British  Computer  Society,  London. 

C^'DASYl.  (1971a),  CODASYi.  Data  Base  Task  Group  April  1971  Rero'-t. 
available  from  ACM,  New  York  City;  IFIP  Administration TJata  Processing 
Groi  p,  Amsterdam;  aii^.  Iritish  Computer  Society,  London. 

CODASYI.  (1971b),  Feature  Analysis  of  Generalized  Data  Base 
Sy£l!Lni£>  April  1971,  available  from  ACM,  New  York  City; 
IFIP  Administration  Data  Processing  Group,  Amsterdam;  and  British 
Computer  Society,  London. 


UNDERSTANDING  DATA  STRUCTURES 
BIBLIOGRAPHY. 


223 


Codcl,  E.  F.  (1970),  "A  Rolstionfll  Model  of  Dflta  for  Large  Shared  Data 

Banks",  CACM  13,  6,  Juno  1970,  pp377-387. 

Codd,  E.  F.  (1972a),  "Further  Normalization  of  the  Data  Base 

Relational  Model,  in  R.  Rustin  (Ed),  Ma  SilLe  Systems.  Prentice  Hall, 
1972. 

Codd,  E.  F.  (1072b),  "Relational  Completeness  of  Data  Base 
Sublanguages",  in  R.  Rustin  (Ed),  IMa  System^  Prentice  Hall, 

1972. 

Codd,  E.  f , and  C.  J.  Date  U97d),  "Interactive  Support  for 

Non-oiogrammci's:  The  Relational  and  Network  Approaches  , IBM 
Rccarch  RJ-ldOO,  June  6,  1974. 

Dahl,  0.  J.,  E.  W.  Dijkstra,  and  C.  A.  R.  Hoare  ( 1 972),  Slrjj^r^ 

Programming,  Academic  Press,  New  York. 

Date,  C.  .1.  and  E.  F.  Codd  (1974),  "The  Relational  and  Network 
Approaches;  Comparison  ol  the  Application  Programming  Interfaces", 
IBM  Research  RJ-1401,  June  6,  1974. 

Delaney,  William  A.  (1966),  "Predicting  the  Costs  of  Computer 
Programs",  Daja  Processing  Magazine,  October  1966,  pp32-34. 

Early,  J.  (1972),  "On  the  Semantics  of  Data  Structures",  in  R.  Rustin 
(EeP,  Dat_a  Base  Sysjrms,  Prentice  Hall,  1972. 

General  Electric  (1965),  InlrcKkictjon  to  Integrated  Data  Siore, 
Publication  CPU- 1048,  General  Electric  Company  Information  Systems 
Depaitment,  Phoenix,  Arizona. 

Gerritsen,  R.  (1974),  "Automatically  Generated  Programs  for 

Informalion  Retrieval;  IRP,  a Rudimentary  System  , Carnegie-Mellon 
University,  Graduate  School  of  Industrial  Administration, 

W.P.-47-73-74. 


Gibson,  T.  A.  and  P.  F.  Stockhausen  (1973),  "MASTER 

LINKS-Hierarchical  Data  System",  Ihe  Bell  System  Techiiical  Journal 
52,  10,  December  1973,  ppl691-1724. 

Hewitt,  C.  (1971),  "Description  and  Theoreticat  Analysis  of  Planner", 
pTiD  Thesis,  Massachusetts  Institute  of  Technology. 

Hoare,  C.  A.  R.  (1969),  "An  Axiomatic  Basis  for  Computer 

Programming",  CACM  12,  10,  October  1969,  pp576-580. 


UNDERSTANDING  DATA  STRUCTURES 
BIBLIOGRAI’IIY. 


224 


Igar^if.hi,  S,  R L.  London  and  D.  C,  Luckham  (1973),  "Automatic 
Program  Verification  I:  A Logical  Basis  and  Implementation",  Stanford 
A. I.  Memorandum  200,  May  1973. 

Informatics  (1969),  MARJ<  IV  File  ManaRcmcnt  System  Reference  ^arui^ 
Informatics  Inc.,  Document  No.  SP-69-810-1. 

Jardine,  Donald  A.  (Ed)  (1974),  Data  Base  Management  Sy^ems, 
Proceedings  of  the  Share  Working  Conference  on  Data  Base 
Management  Systems,  Montreal,  Canada,  July  23-27,  1973, 

North-Holland  Publishing  Company,  1974,  ppl57-)60. 

La  Bolle,  V.  (1966),  "Development  of  Equations  for  Estimating  the 
Costs  of  Computer  Program  Production",  System  Development 
Corporation  Technical  Memorandum  TM-29 18/000/00,  April  5,  1966. 

Lavalcc,  P.  A.,  S.  Ohayon  and  R.  Sauvain  (undated),  "DMS  Data  Base 
Strategics  for  Interrogation  and  Update",  Xerox,  Technology  Report. 

McCarthy,  J.,  P.  W.  Abrahams,  D.  J.  Edwards,  T.  P.  Hart  and  M. 
I.  * Levin  (1972),  USP  ,L5  Programmer’s  Maniial,  M.I.T.  Press. 

McKceman,  W.,  J.  J.  Horning,  and  D.  B.  Worlrnan  (1970),  A Comrjjor 
Generalor,  Prentice  Hall,  New  Jersey. 

Naur,  P.,  et  al.  (1960),  "Report  on  the  Algorithmic  Language  ALGOL 
60",  CWIM  3,  p299. 

Nelson,  E.  A.  (1966),  "Management  Handbook  for  the  Estimation  of 
Computer  Programming  Costs",  System  Development  Corporation 
Technical  Memorandum  TM-3225/000/00,  October  31,  1966. 

Newell,  A.,  J.  C.  Shaw,  and  H.  A.  Simon  (I960),  "Report  on  a 
General  Problem-Solving  Program  for  a Computer",  Information 
Processing,  proceedings.  International  Conference  of  Information 
Processing,  Paris:  UNESCO,  1960,  pp256-264. 

Post  ley,  John  A.  (1969),  "General  Purp<,  se  Systems:  The  MARK  IV  File 
Management  System",  in  Fred  Gruenberger  (Ed),  Critical  Factors  in 
Data  Management,  Prentice  Hall,  Now  Jersey,  1969. 

Puerling,  B.  W.  and  J.  T.  Roberts  (1973),  "The  Natural  Dialogue 
System".  The  Bell  Systems  Technical  Journal  52,  10,  December  1973, 
pp 1725-1741" 

Sharpe,  William  F.  (1969),  The  Economics  ol  Computers.  Columbia 
University  Press,  New  York,  1969. 


UNDERSTANDING  DATA  STRUCTURES 
BIBLIOGRAPHY. 


225 


Shcflnt,  H.  (1958),  Faster  RcadinR  Sclf-Taiip.ht,  Washington  Square 
Press,  New  York. 

Simon,  It.  A.  (1961),  "Experiments  with  a Heuristic  Compiler",  the 
RAND  Corp.,  Report  P-23A9. 

Simon,  It,  A.  and  Laurent  Siklossy  (1972),  Reoresentaticn  amj 
Meaninr.i  L yperiments  with  Data  Processing  Systems,  Prentice  Hall, 
Englewood  Cliffs,  N.J. 

Summers,  Allen  L.  (1974),  "The  Captive  Computer  Utility",  Computer 
Dccjsipns  6,  1,  January  1974. 

Sussman,  Gerald  J.  and  Drew  V.  McDermott  (1972),  "Why  Conniving  is 
Bettor  than  Planning",  Massachusetts  Institute  of  Technology,  Artificial 
Intelligence  Memo  255A,  April  1972. 

Sussman,  CTeralc)  J.  and  Terry  Winograd  (1972),  "Micro-planner 
Reference  Manual",  M.I.T.  Project  MAC  Report. 

Teichroow,  Daniel  (1970),  "Problem  Statement  Languages  in  MIS", 
Proceedings,  International  Symposium  of  BIFOA,  "Management 
Information  Systems  - A Challenge  to  Scientific  Research",  Cologne, 
July  1970,  pp253-270. 

Teichroow,  Daniel  and  Hasan  Sayani  (1971),  "Automation  of  System 
Building",  Datamation,  August  15,  1971,  pp25-30. 

Waldinger,  R.  J.  and  R.  C.  T.  Lee  (1969),  "PROW:  A Step  Toward 
Automatic  Program  Writing",  presented  at  the  International  Joint 
Conference  on  Artificial  Intelligence,  1969. 

Xerox  (1970),  XDS  Sigma  Da[a  ManaRcment  System  (DMS)  Reference 
Manual,  Document  90-17-38  A,  Xerox  Data  Systems,  El  Segundo, 
California. 


